期刊论文详细信息
BMC Bioinformatics
Comparing somatic mutation-callers: beyond Venn diagrams
Research Article
Su Yeon Kim1  Terence P Speed2 
[1] Department of Statistics, University of California at Berkeley, 367 Evans Hall, 94720, Berkeley, CA, USA;Department of Statistics, University of California at Berkeley, 367 Evans Hall, 94720, Berkeley, CA, USA;Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia;
关键词: Cancer genome;    Next-generation sequencing;    Somatic mutation-calling;    Methods comparison;    Validation;   
DOI  :  10.1186/1471-2105-14-189
 received in 2013-02-19, accepted in 2013-05-30,  发布年份 2013
来源: Springer
PDF
【 摘 要 】

BackgroundSomatic mutation-calling based on DNA from matched tumor-normal patient samples is one of the key tasks carried by many cancer genome projects. One such large-scale project is The Cancer Genome Atlas (TCGA), which is now routinely compiling catalogs of somatic mutations from hundreds of paired tumor-normal DNA exome-sequence data. Nonetheless, mutation calling is still very challenging. TCGA benchmark studies revealed that even relatively recent mutation callers from major centers showed substantial discrepancies. Evaluation of the mutation callers or understanding the sources of discrepancies is not straightforward, since for most tumor studies, validation data based on independent whole-exome DNA sequencing is not available, only partial validation data for a selected (ascertained) subset of sites.ResultsTo provide guidelines to comparing outputs from multiple callers, we have analyzed two sets of mutation-calling data from the TCGA benchmark studies and their partial validation data. Various aspects of the mutation-calling outputs were explored to characterize the discrepancies in detail. To assess the performances of multiple callers, we introduce four approaches utilizing the external sequence data to varying degrees, ranging from having independent DNA-seq pairs, RNA-seq for tumor samples only, the original exome-seq pairs only, or none of those.ConclusionsOur analyses provide guidelines to visualizing and understanding the discrepancies among the outputs from multiple callers. Furthermore, applying the four evaluation approaches to the whole exome data, we illustrate the challenges and highlight the various circumstances that require extra caution in assessing the performances of multiple callers.

【 授权许可】

CC BY   
© Kim and Speed; licensee BioMed Central Ltd. 2013

【 预 览 】
附件列表
Files Size Format View
RO202311107186985ZK.pdf 1815KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  文献评价指标  
  下载次数:3次 浏览次数:1次