BMC Bioinformatics | |
ITD assembler: an algorithm for internal tandem duplication discovery from short-read sequencing data | |
Methodology Article | |
Marek Kimmel1  Liu Xi2  Jie Li3  David A. Wheeler4  Richard A. Gibbs4  Oliver A Hampton4  Sharon E. Plon5  Navin Rustagi6  | |
[1] Department of Statistics, Rice University, Houston, TX, USA;Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA;Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA;Department of Dermatology, Xiangya Hospital, Central South University, Hunan, China;Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA;Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA;Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA;Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA;Department of Pediatrics/Hematology-Oncology, Texas Children’s Hospital, Houston, TX, USA;Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA;Department of Statistics, Rice University, Houston, TX, USA; | |
关键词: Tandem duplication; De Bruijn graphs; Assembly; FLT3; Data mining; Cancer genetics; AML; Clustering; Somatic mutations; | |
DOI : 10.1186/s12859-016-1031-8 | |
received in 2015-09-15, accepted in 2016-04-12, 发布年份 2016 | |
来源: Springer | |
【 摘 要 】
BackgroundDetection of tandem duplication within coding exons, referred to as internal tandem duplication (ITD), remains challenging due to inefficiencies in alignment of ITD-containing reads to the reference genome. There is a critical need to develop efficient methods to recover these important mutational events.ResultsIn this paper we introduce ITD Assembler, a novel approach that rapidly evaluates all unmapped and partially mapped reads from whole exome NGS data using a De Bruijn graphs approach to select reads that harbor cycles of appropriate length, followed by assembly using overlap-layout-consensus. We tested ITD Assembler on The Cancer Genome Atlas AML dataset as a truth set. ITD Assembler identified the highest percentage of reported FLT3-ITDs when compared to other ITD detection algorithms, and discovered additional ITDs in FLT3, KIT, CEBPA, WT1 and other genes. Evidence of polymorphic ITDs in 54 genes were also found. Novel ITDs were validated by analyzing the corresponding RNA sequencing data.ConclusionsITD Assembler is a very sensitive tool which can detect partial, large and complex tandem duplications. This study highlights the need to more effectively look for ITD’s in other cancers and Mendelian diseases.
【 授权许可】
CC BY
© Rustagi et al. 2016
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311103791219ZK.pdf | 844KB | download |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]