期刊论文详细信息
BMC Genomics
DB2: a probabilistic approach for accurate detection of tandem duplication breakpoints using paired-end reads
Methodology Article
Mehmet Koyutürk1  Gökhan Yavaş2  Meetha P Gould3  Sarah McMahon3  Thomas LaFramboise4 
[1] Department of Electrical Engineering & Computer Science, Case Western Reserve University, 10900 Euclid Avenue, 44106, Cleveland, OH, USA;Center for Proteomics and Bioinformatics, Case Western Reserve University, 10900 Euclid Avenue, 44106, Cleveland, OH, USA;Department of Epidemiology & Biostatistics, Case Western Reserve University, 10900 Euclid Avenue, 44106, Cleveland, OH, USA;Department of Genetics and Genome Sciences, Case Western Reserve University, 10900 Euclid Avenue, 44106, Cleveland, OH, USA;Department of Genetics and Genome Sciences, Case Western Reserve University, 10900 Euclid Avenue, 44106, Cleveland, OH, USA;Center for Proteomics and Bioinformatics, Case Western Reserve University, 10900 Euclid Avenue, 44106, Cleveland, OH, USA;Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic Foundation, 9500 Euclid Avenue, 44195, Cleveland, OH, USA;
关键词: Tandem Duplication;    Read Pair;    Donor Genome;    Read Library;    Fragment Length Distribution;   
DOI  :  10.1186/1471-2164-15-175
 received in 2013-06-12, accepted in 2014-02-18,  发布年份 2014
来源: Springer
PDF
【 摘 要 】

BackgroundWith the advent of paired-end high throughput sequencing, it is now possible to identify various types of structural variation on a genome-wide scale. Although many methods have been proposed for structural variation detection, most do not provide precise boundaries for identified variants. In this paper, we propose a new method, Distribution Based detection of Duplication Boundaries (DB2), for accurate detection of tandem duplication breakpoints, an important class of structural variation, with high precision and recall.ResultsOur computational experiments on simulated data show that DB2 outperforms state-of-the-art methods in terms of finding breakpoints of tandem duplications, with a higher positive predictive value (precision) in calling the duplications’ presence. In particular, DB2’s prediction of tandem duplications is correct 99% of the time even for very noisy data, while narrowing down the space of possible breakpoints within a margin of 15 to 20 bps on the average. Most of the existing methods provide boundaries in ranges that extend to hundreds of bases with lower precision values. Our method is also highly robust to varying properties of the sequencing library and to the sizes of the tandem duplications, as shown by its stable precision, recall and mean boundary mismatch performance. We demonstrate our method’s efficacy using both simulated paired-end reads, and those generated from a melanoma sample and two ovarian cancer samples. Newly discovered tandem duplications are validated using PCR and Sanger sequencing.ConclusionsOur method, DB2, uses discordantly aligned reads, taking into account the distribution of fragment length to predict tandem duplications along with their breakpoints on a donor genome. The proposed method fine tunes the breakpoint calls by applying a novel probabilistic framework that incorporates the empirical fragment length distribution to score each feasible breakpoint. DB2 is implemented in Java programming language and is freely available at http://mendel.gene.cwru.edu/laframboiselab/software.php.

【 授权许可】

Unknown   
© Yavaş et al.; licensee BioMed Central Ltd. 2014. This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.

【 预 览 】
附件列表
Files Size Format View
RO202311102167193ZK.pdf 1371KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  文献评价指标  
  下载次数:3次 浏览次数:1次