期刊论文详细信息
BMC Research Notes
Comparison of insertion/deletion calling algorithms on human next-generation sequencing data
Alex R Paciorkowski3  Emily Tuttle2  Jason R Myers1  Dalia H Ghoneim2 
[1] Genomics Research Center, University of Rochester Medical Center, Rochester, NY, USA;Center for Neural Development and Disease, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY, USA;Departments of Neurology, Pediatrics, and Biomedical Genetics, University of Rochester Medical Center, Rochester, NY, USA
关键词: Pindel;    GATK;    Concordance;    Validation;    Indels;    Next generation sequencing;   
Others  :  1118085
DOI  :  10.1186/1756-0500-7-864
 received in 2014-07-31, accepted in 2014-11-21,  发布年份 2014
PDF
【 摘 要 】

Background

Insertions/deletions (indels) are the second most common type of genomic variant and the most common type of structural variant. Identification of indels in next generation sequencing data is a challenge, and algorithms commonly used for indel detection have not been compared on a research cohort of human subject genomic data. Guidelines for the optimal detection of biologically significant indels are limited. We analyzed three sets of human next generation sequencing data (48 samples of a 200 gene target exon sequencing, 45 samples of whole exome sequencing, and 2 samples of whole genome sequencing) using three algorithms for indel detection (Pindel, Genome Analysis Tool Kit's UnifiedGenotyper and HaplotypeCaller).

Results

We observed variation in indel calls across the three algorithms. The intersection of the three tools comprised only 5.70% of targeted exon, 19.52% of whole exome, and 14.25% of whole genome indel calls. The majority of the discordant indels were of lower read depth and likely to be false positives. When software parameters were kept consistent across the three targets, HaplotypeCaller produced the most reliable results. Pindel results did not validate well without adjustments to parameters to account for varied read depth and number of samples per run. Adjustments to Pindel's M (minimum support for event) parameter improved both concordance and validation rates. Pindel was able to identify large deletions that surpassed the length capabilities of the GATK algorithms.

Conclusions

Despite the observed variability in indel identification, we discerned strengths among the individual algorithms on specific data sets. This allowed us to suggest best practices for indel calling. Pindel's low validation rate of indel calls made in targeted exon sequencing suggests that HaplotypeCaller is better suited for short indels and multi-sample runs in targets with very high read depth. Pindel allows for optimization of minimum support for events and is best used for detection of larger indels at lower read depths.

【 授权许可】

   
2014 Ghoneim et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150206020627160.pdf 884KB PDF download
Figure 5. 52KB Image download
Figure 4. 97KB Image download
Figure 3. 44KB Image download
Figure 2. 63KB Image download
Figure 1. 63KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

【 参考文献 】
  • [1]Mullaney JM, Mills RE, Pittard WS, Devine SE: Small insertions and deletions (INDELs) in human genomes. Hum Mol Genet 2010, 19:R131-R136.
  • [2]Mills RE, Luttig CT, Larkins CE, Beauchamp A, Tsui C, Pittard WS, Devine SE: An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Res 2006, 16:1182-1190.
  • [3]Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, Boutell JM, Bryant J, Carter RJ, Keira Cheetham R, Cox AJ, Ellis DJ, Flatbush MR, Gormley NA, Humphray SJ, Irving LJ, Karbelashvili MS, Kirk SM, Li H, Liu X, Maisinger KS, Murray LJ, Obradovic B, Ost T, Parkinson ML, Pratt MR, et al.: Accurate whole human genome sequencing using reversible terminator chemistry. Nature 2008, 456:53-59.
  • [4]Wang J, Wang W, Li R, Li Y, Tian G, Goodman L, Fan W, Zhang J, Li J, Zhang J, Guo Y, Feng B, Li H, Lu Y, Fang X, Liang H, Du Z, Li D, Zhao Y, Hu Y, Yang Z, Zheng H, Hellmann I, Inouye M, Pool J, Yi X, Zhao J, Duan J, Zhou Y, Qin J, et al.: The diploid genome sequence of an Asian individual. Nature 2008, 456:60-65.
  • [5]Bárcena C, Quesada V, De Sandre-Giovannoli A, Puente DA, Fernández-Toral J, Sigaudy S, Baban A, Lévy N, Velasco G, López-Otín C: Exome sequencing identifies a novel mutation in PIK3R1 as the cause of SHORT syndrome. BMC Med Genet 2014, 15:51.
  • [6]Meijer H, de Graaff E, Merckx DM, Jongbloed RJ, de Die-Smulders CE, Engelen JJ, Fryns JP, Curfs PM, Oostra BA: A deletion of 1.6 kb proximal to the CGG repeat of the FMR1 gene causes the clinical phenotype of the fragile X syndrome. Hum Mol Genet 1994, 3:615-620.
  • [7]Schutte DL, Maas M, Buckwalter KC: A LRPAP1 intronic insertion/deletion polymorphism and phenotypic variability in Alzheimer disease. Res Theory Nurs Pract 2003, 17:301-319. discussion 335–338
  • [8]Zhang X, Lin H, Zhao H, Hao Y, Mort M, Cooper DN, Zhou Y, Liu Y: Impact of human pathogenic micro-insertions and micro-deletions on post-transcriptional regulation. Hum Mol Genet 2014, 23:3024-3034.
  • [9]Albers CA, Lunter G, MacArthur DG, McVean G, Ouwehand WH, Durbin R: Dindel: accurate indel calls from short-read data. Genome Res 2011, 21:961-973.
  • [10]Fang H, Wu Y, Narzisi G, O'Rawe JA, Barron LTJ, Rosenbaum J, Ronemus M, Iossifov I, Schatz MC, Lyon GJ: Reducing INDEL calling errors in whole genome and exome sequencing data. Genome Med 2014, 6:89.
  • [11]Zhang Z, Gerstein M: Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes. Nucleic Acids Res 2003, 31:5338-5348.
  • [12]Mills RE, Pittard WS, Mullaney JM, Farooq U, Creasy TH, Mahurkar AA, Kemeza DM, Strassler DS, Ponting CP, Webber C, Devine SE: Natural genetic variation caused by small insertions and deletions in the human genome. Genome Res 2011, 21:830-839.
  • [13]McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA: The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 2010, 20:1297-1303.
  • [14]Lescai F, Marasco E, Bacchelli C, Stanier P, Mantovani V, Beales P: Identification and validation of loss of function variants in clinical contexts. Mol Genet Genomic Med 2014, 2:58-63.
  • [15]Neuman JA, Isakov O, Shomron N: Analysis of insertion-deletion from deep-sequencing data: software evaluation for optimal detection. Brief Bioinform 2013, 14:46-55.
  • [16]Ye K, Schulz MH, Long Q, Apweiler R, Ning Z: Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinforma Oxf Engl 2009, 25:2865-2871.
  • [17]Li S, Li R, Li H, Lu J, Li Y, Bolund L, Schierup MH, Wang J: SOAPindel: efficient identification of indels from short paired reads. Genome Res 2013, 23:195-200.
  • [18]Suzuki S, Yasuda T, Shiraishi Y, Miyano S, Nagasaki M: ClipCrop: a tool for detecting structural variations with single-base resolution using soft-clipping information. BMC Bioinformatics 2011, 12(Suppl 14):S7. BioMed Central Full Text
  • [19]Daber R, Sukhadia S, Morrissette JJD: Understanding the limitations of next generation sequencing informatics, an approach to clinical pipeline validation using artificial data sets. Cancer Genet 2013, 206:441-448.
  • [20]Karakoc E, Alkan C, O’Roak BJ, Dennis MY, Vives L, Mark K, Rieder MJ, Nickerson DA, Eichler EE: Detection of structural variants and indels within exome data. Nat Methods 2012, 9:176-178.
  • [21]The GATK Guidebook version 2.3-9 http://www.broadinstitute.org/gatk/pdfdocs/GATK_GuideBook_2.3-9.pdf webcite
  • [22]Krawitz P, Rödelsperger C, Jäger M, Jostins L, Bauer S, Robinson PN: Microindel detection in short-read sequence data. Bioinforma Oxf Engl 2010, 26:722-729.
  • [23]Rimmer A, Phan H, Mathieson I, Iqbal Z, Twigg SRF, Wilkie AOM, McVean G, Lunter G, WGS500 Consortium: Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat Genet 2014, 46:912-918.
  文献评价指标  
  下载次数:58次 浏览次数:2次