期刊论文详细信息
BMC Genomics
Tangram: a comprehensive toolbox for mobile element insertion detection
Gabor T Marth1  Mark A Batzer3  Miriam K Konkel3  Jerilyn A Walker3  Alistair Ward2  Wan-Ping Lee2  Jiantao Wu2 
[1] Department of Human Genetics and USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, Utah, USA;Department of Biology, Boston College, Chestnut Hill, MA, USA;Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
关键词: High-throughput sequencing;    SVA;    Alu;    L1;    Endogenous retrovirus;    Retrotransposon;    Mobile element insertion;    Structural variation;   
Others  :  1139578
DOI  :  10.1186/1471-2164-15-795
 received in 2014-02-28, accepted in 2014-09-03,  发布年份 2014
PDF
【 摘 要 】

Background

Mobile elements (MEs) constitute greater than 50% of the human genome as a result of repeated insertion events during human genome evolution. Although most of these elements are now fixed in the population, some MEs, including ALU, L1, SVA and HERV-K elements, are still actively duplicating. Mobile element insertions (MEIs) have been associated with human genetic disorders, including Crohn’s disease, hemophilia, and various types of cancer, motivating the need for accurate MEI detection methods. To comprehensively identify and accurately characterize these variants in whole genome next-generation sequencing (NGS) data, a computationally efficient detection and genotyping method is required. Current computational tools are unable to call MEI polymorphisms with sufficiently high sensitivity and specificity, or call individual genotypes with sufficiently high accuracy.

Results

Here we report Tangram, a computationally efficient MEI detection program that integrates read-pair (RP) and split-read (SR) mapping signals to detect MEI events. By utilizing SR mapping in its primary detection module, a feature unique to this software, Tangram is able to pinpoint MEI breakpoints with single-nucleotide precision. To understand the role of MEI events in disease, it is essential to produce accurate individual genotypes in clinical samples. Tangram is able to determine sample genotypes with very high accuracy. Using simulations and experimental datasets, we demonstrate that Tangram has superior sensitivity, specificity, breakpoint resolution and genotyping accuracy, when compared to other, recently developed MEI detection methods.

Conclusions

Tangram serves as the primary MEI detection tool in the 1000 Genomes Project, and is implemented as a highly portable, memory-efficient, easy-to-use C++ computer program, built under an open-source development model.

【 授权许可】

   
2014 Wu et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150322012320539.pdf 722KB PDF download
Figure 5. 68KB Image download
Figure 4. 51KB Image download
Figure 3. 27KB Image download
Figure 2. 27KB Image download
Figure 1. 30KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

【 参考文献 】
  • [1]Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, Redon R, Bird CP, de Grassi A, Lee C, Tyler-Smith C, Carter N, Scherer SW, Tavare S, Deloukas P, Hurles ME, Dermitzakis ET: Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 2007, 315(5813):848-853.
  • [2]Korbel JO, Tirosh-Wagner T, Urban AE, Chen XN, Kasowski M, Dai L, Grubert F, Erdman C, Gao MC, Lange K, Sobel EM, Barlow GM, Aylsworth AS, Carpenter NJ, Clark RD, Cohen MY, Doran E, Falik-Zaccai T, Lewin SO, Lott IT, McGillivray BC, Moeschler JB, Pettenati MJ, Pueschel SM, Rao KW, Shaffer LG, Shohat M, Van Riper AJ, Warburton D, Weissman S, et al.: The genetic architecture of Down syndrome phenotypes revealed by high-resolution analysis of human segmental trisomies. Proc Natl Acad Sci U S A 2009, 106(29):12031-12036.
  • [3]Zhang F, Gu W, Hurles ME, Lupski JR: Copy number variation in human health, disease, and evolution. Annu Rev Genomics Hum Genet 2009, 10:451-481.
  • [4]Campbell PJ, Stephens PJ, Pleasance ED, O’Meara S, Li H, Santarius T, Stebbings LA, Leroy C, Edkins S, Hardy C, Teague JW, Menzies A, Goodhead I, Turner DJ, Clee CM, Quail MA, Cox A, Brown C, Durbin R, Hurles ME, Edwards PA, Bignell GR, Stratton MR, Futreal PA: Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat Genet 2008, 40(6):722-729.
  • [5]Ye K, Schulz MH, Long Q, Apweiler R, Ning Z: Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 2009, 25(21):2865-2871.
  • [6]Sudmant PH, Kitzman JO, Antonacci F, Alkan C, Malig M, Tsalenko A, Sampas N, Bruhn L, Shendure J, Genomes P, Eichler EE: Diversity of human copy number variation and multicopy genes. Science 2010, 330(6004):641-646.
  • [7]Rausch T, Zichner T, Schlattl A, Stutz AM, Benes V, Korbel JO: DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 2012, 28(18):i333-i339.
  • [8]Wu J, Grzeda KR, Stewart C, Grubert F, Urban AE, Snyder MP, Marth GT: Copy Number Variation detection from 1000 Genomes project exon capture sequencing data. BMC Bioinformatics 2012, 13(1):305. BioMed Central Full Text
  • [9]Handsaker RE, Korn JM, Nemesh J, McCarroll SA: Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nat Genet 2011, 43(3):269-276.
  • [10]de Koning AP, Gu W, Castoe TA, Batzer MA, Pollock DD: Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet 2011, 7(12):e1002384.
  • [11]Deininger PL, Batzer MA, Hutchison CA 3rd, Edgell MH: Master genes in mammalian repetitive DNA amplification. Trends Genet 1992, 8(9):307-311.
  • [12]Cordaux R, Batzer MA: The impact of retrotransposons on human genome evolution. Nat Rev Genet 2009, 10(10):691-703.
  • [13]Konkel MK, Batzer MA: A mobile threat to genome stability: The impact of non-LTR retrotransposons upon the human genome. Semin Cancer Biol 2010, 20(4):211-221.
  • [14]Kazazian HH Jr, Wong C, Youssoufian H, Scott AF, Phillips DG, Antonarakis SE: Haemophilia A resulting from de novo insertion of L1 sequences represents a novel mechanism for mutation in man. Nature 1988, 332(6160):164-166.
  • [15]Kazazian HH Jr: Mobile elements: drivers of genome evolution. Science 2004, 303(5664):1626-1632.
  • [16]Pace JK 2nd, Feschotte C: The evolutionary history of human DNA transposons: evidence for intense activity in the primate lineage. Genome Res 2007, 17(4):422-432.
  • [17]Xing J, Witherspoon DJ, Ray DA, Batzer MA, Jorde LB: Mobile DNA elements in primate and human evolution. Am J Phys Anthropol 2007, 134(Suppl 45):2-19.
  • [18]Belancio VP, Hedges DJ, Deininger P: Mammalian non-LTR retrotransposons: for better or worse, in sickness and in health. Genome Res 2008, 18(3):343-358.
  • [19]Wang H, Xing J, Grover D, Hedges DJ, Han K, Walker JA, Batzer MA: SVA elements: a hominid-specific retroposon family. J Mol Biol 2005, 354(4):994-1007.
  • [20]Mills RE, Bennett EA, Iskow RC, Luttig CT, Tsui C, Pittard WS, Devine SE: Recently mobilized transposons in the human and chimpanzee genomes. Am J Hum Genet 2006, 78(4):671-679.
  • [21]Khan H, Smit A, Boissinot S: Molecular evolution and tempo of amplification of human LINE-1 retrotransposons since the origin of primates. Genome Res 2006, 16(1):78-87.
  • [22]Britten RJ: Evidence that most human Alu sequences were inserted in a process that ceased about 30 million years ago. Proc Natl Acad Sci U S A 1994, 91(13):6148-6150.
  • [23]McCarroll SA, Huett A, Kuballa P, Chilewski SD, Landry A, Goyette P, Zody MC, Hall JL, Brant SR, Cho JH, Duerr RH, Silverberg MS, Taylor KD, Rioux JD, Altshuler D, Daly MJ, Xavier RJ: Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn’s disease. Nat Genet 2008, 40(9):1107-1112.
  • [24]Miki Y, Katagiri T, Kasumi F, Yoshimoto T, Nakamura Y: Mutation analysis in the BRCA2 gene in primary breast cancers. Nat Genet 1996, 13(2):245-247.
  • [25]Lee E, Iskow R, Yang L, Gokcumen O, Haseley P, Luquette LJ 3rd, Lohr JG, Harris CC, Ding L, Wilson RK, Wheeler DA, Gibbs RA, Kucherlapati R, Lee C, Kharchenko PV, Park PJ, Cancer Genome Atlas Research N: Landscape of somatic retrotransposition in human cancers. Science 2012, 337(6097):967-971.
  • [26]Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME, McVean GA, Genomes Project C: A map of human genome variation from population-scale sequencing. Nature 2010, 467(7319):1061-1073.
  • [27]Stewart C, Kural D, Stromberg MP, Walker JA, Konkel MK, Stutz AM, Urban AE, Grubert F, Lam HY, Lee WP, Busby M, Indap AR, Garrison E, Huff C, Xing J, Snyder MP, Jorde LB, Batzer MA, Korbel JO, Marth GT, Genomes P: A comprehensive map of mobile element insertion polymorphisms in humans. PLoS Genet 2011, 7(8):e1002236.
  • [28]Mills RE, Walter K, Stewart C, Handsaker RE, Chen K, Alkan C, Abyzov A, Yoon SC, Ye K, Cheetham RK, Chinwalla A, Conrad DF, Fu Y, Grubert F, Hajirasouliha I, Hormozdiari F, Iakoucheva LM, Iqbal Z, Kang S, Kidd JM, Konkel MK, Korn J, Khurana E, Kural D, Lam HY, Leng J, Li R, Li Y, Lin CY, Luo R, et al.: Mapping copy number variation by population-scale genome sequencing. Nature 2011, 470(7332):59-65.
  • [29]Keane TM, Wong K, Adams DJ: RetroSeq: Transposable element discovery from Illumina paired-end sequencing data. Bioinformatics 2012, 29(3):389-390.
  • [30]Hormozdiari F, Hajirasouliha I, Dao P, Hach F, Yorukoglu D, Alkan C, Eichler EE, Sahinalp SC: Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery. Bioinformatics 2010, 26(12):i350-i357.
  • [31]Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, Genomes Project Analysis G: The variant call format and VCFtools. Bioinformatics 2011, 27(15):2156-2158.
  • [32]Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data Processing S: The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, 25(16):2078-2079.
  • [33]Lee WP, Stromberg MP, Ward A, Stewart C, Garrison EP, Marth GT: MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping. PLoS One 2014, 9(3):e90581.
  • [34]Myers JS, Vincent BJ, Udall H, Watkins WS, Morrish TA, Kilroy GE, Swergold GD, Henke J, Henke L, Moran JV, Jorde LB, Batzer MA: A comprehensive analysis of recently integrated human Ta L1 elements. Am J Hum Genet 2002, 71(2):312-326.
  • [35]Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA, Genomes Project C: An integrated map of genetic variation from 1,092 human genomes. Nature 2012, 491(7422):56-65.
  • [36]Le SQ, Durbin R: SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples. Genome Res 2011, 21(6):952-960.
  • [37]McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA: The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 2010, 20(9):1297-1303.
  • [38]Xing J, Zhang Y, Han K, Salem AH, Sen SK, Huff CD, Zhou Q, Kirkness EF, Levy S, Batzer MA, Jorde LB: Mobile elements create structural variation: analysis of a complete human genome. Genome Res 2009, 19(9):1516-1526.
  • [39]Huang CR, Schneider AM, Lu Y, Niranjan T, Shen P, Robinson MA, Steranka JP, Valle D, Civin CI, Wang T, Wheelan SJ, Ji H, Boeke JD, Burns KH: Mobile interspersed repeats are major structural variants in the human genome. Cell 2010, 141(7):1171-1182.
  • [40]Iskow RC, McCabe MT, Mills RE, Torene S, Pittard WS, Neuwald AF, Van Meir EG, Vertino PM, Devine SE: Natural mutagenesis of human genomes by endogenous retrotransposons. Cell 2010, 141(7):1253-1261.
  • [41]Witherspoon DJ, Xing J, Zhang Y, Watkins WS, Batzer MA, Jorde LB: Mobile element scanning (ME-Scan) by targeted high-throughput sequencing. BMC Genomics 2010, 11:410. BioMed Central Full Text
  • [42]Beck CR, Garcia-Perez JL, Badge RM, Moran JV: LINE-1 elements in structural variation and disease. Annu Rev Genomics Hum Genet 2011, 12:187-215.
  • [43]Ewing AD, Kazazian HH Jr: Whole-genome resequencing allows detection of many rare LINE-1 insertion alleles in humans. Genome Res 2011, 21(6):985-990.
  • [44]Hormozdiari F, Alkan C, Ventura M, Hajirasouliha I, Malig M, Hach F, Yorukoglu D, Dao P, Bakhshi M, Sahinalp SC, Eichler EE: Alu repeat discovery and characterization within human genomes. Genome Res 2011, 21(6):840-849.
  • [45]Wang J, Song L, Grover D, Azrak S, Batzer MA, Liang P: dbRIP: a highly integrated database of retrotransposon insertion polymorphisms in humans. Hum Mutat 2006, 27(4):323-329.
  • [46]Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 2009, 10(3):R25. BioMed Central Full Text
  • [47]Kohany O, Gentles AJ, Hankus L, Jurka J: Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor. BMC Bioinformatics 2006, 7:474. BioMed Central Full Text
  • [48]Knuth DE: The Art of Computer Programming. Reading, Mass: Addison-Wesley Pub. Co; 1968.
  • [49]Youssef S: Clustering with local equivalence relations. Comput Phys Commun 1987, 45(1–3):423-426.
  • [50]Bentley JL: Programming Pearls. 2nd edition. Reading, Mass: Addison-Wesley; 2000.
  • [51]Smit AFA, Hubley R, Green P: RepeatMasker Open-3.0. 1996-2010. http://www.repeatmasker.org/ webcite
  • [52]Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D: The human genome browser at UCSC. Genome Res 2002, 12(6):996-1006.
  • [53]Holtgrewe M: Mason – a read simulator for second generation sequencing data. Technical Report TR-B-10-06 2010.
  • [54]Barnett DW, Garrison EK, Quinlan AR, Stromberg MP, Marth GT: BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics 2011, 27(12):1691-1692.
  • [55]Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009, 25(14):1754-1760.
  • [56]Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I, Taylor J, Miller W, Kent WJ, Nekrutenko A: Galaxy: a platform for interactive large-scale genome analysis. Genome Res 2005, 15(10):1451-1455.
  • [57]Goecks J, Nekrutenko A, Taylor J, Galaxy T: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 2010, 11(8):R86. BioMed Central Full Text
  • [58]Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M, Nekrutenko A, Taylor J: Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol 2010, Chapter 19:Unit 19 10 11-21.
  • [59]You FM, Huo N, Gu YQ, Luo MC, Ma Y, Hane D, Lazo GR, Dvorak J, Anderson OD: BatchPrimer3: a high throughput web application for PCR and sequencing primer design. BMC Bioinformatics 2008, 9:253. BioMed Central Full Text
  • [60]Kent WJ: BLAT–the BLAST-like alignment tool. Genome Res 2002, 12(4):656-664.
  • [61]Hall TA: BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser 1999, 41:95-98.
  • [62]Rozen S, Skaletsky H: Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol 2000, 132:365-386.
  文献评价指标  
  下载次数:108次 浏览次数:31次