期刊论文详细信息
BMC Bioinformatics
Combining calls from multiple somatic mutation-callers
Su Yeon Kim1  Laurent Jacob3  Terence P Speed2 
[1] Department of Statistics, University of California at Berkeley, Berkeley CA 94720, USA
[2] , Walter and Eliza Hall Institute of Medical Research and the University of Melbourne, Parkville, Victoria, Australia
[3] Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, INRA, UMR5558 Villeurbanne, France
关键词: Stacking;    Combining calls;    Somatic mutation-calling;    Cancer genome;   
Others  :  818544
DOI  :  10.1186/1471-2105-15-154
 received in 2014-02-16, accepted in 2014-05-12,  发布年份 2014
PDF
【 摘 要 】

Background

Accurate somatic mutation-calling is essential for insightful mutation analyses in cancer studies. Several mutation-callers are publicly available and more are likely to appear. Nonetheless, mutation-calling is still challenging and there is unlikely to be one established caller that systematically outperforms all others. Therefore, fully utilizing multiple callers can be a powerful way to construct a list of final calls for one’s research.

Results

Using a set of mutations from multiple callers that are impartially validated, we present a statistical approach for building a combined caller, which can be applied to combine calls in a wider dataset generated using a similar protocol. Using the mutation outputs and the validation data from The Cancer Genome Atlas endometrial study (6,746 sites), we demonstrate how to build a statistical model that predicts the probability of each call being a somatic mutation, based on the detection status of multiple callers and a few associated features.

Conclusion

The approach allows us to build a combined caller across the full range of stringency levels, which outperforms all of the individual callers.

【 授权许可】

   
2014 Kim et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140711110617465.pdf 352KB PDF download
Figure 3. 79KB Image download
Figure 2. 73KB Image download
Figure 1. 32KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

【 参考文献 】
  • [1]Hansen NF, Gartner JJ, Mei L, Samuels Y, Mullikin JC: Shimmer: detection of genetic alterations in tumors using next-generation sequence data. Bioinformatics 2013, 29(12):1498-1503.
  • [2]Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C, Gabriel S, Meyerson M, Lander ES, Getz G: Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol 2013, 31(3):213-219.
  • [3]Saunders CT, Wong WS, Swamy S, Becq J, Murray LJ, Cheetham RK: Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 2012, 28(14):1811-1817.
  • [4]Ding J, Bashashati A, Roth A, Oloumi A, Tse K, Zeng T, Haffari G, Hirst M, Marra MA, Condon A, Aparicio S, Shah SP: Feature-based classifiers for somatic mutation detection in tumour-normal paired sequencing data. Bioinformatics 2012, 28(2):167-175.
  • [5]Roth A, Ding J, Morin R, Crisan A, Ha G, Giuliany R, Bashashati A, Hirst M, Turashvili G, Oloumi A, Marra MA, Aparicio S, Shah SP: JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data. Bioinformatics 2012, 28(7):907-913.
  • [6]Larson DE, Harris CC, Chen K, Koboldt DC, Abbott TE, Dooling DJ, Ley TJ, Mardis ER, Wilson RK, Ding L: SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 2012, 28(3):311-317.
  • [7]Lower M, Renard BY, de Graaf J, Wagner M, Paret C, Kneip C, Tureci O, Diken M, Britten C, Kreiter S, Koslowski M, Castle JC, Sahin U: Confidence-based somatic mutation evaluation and prioritization. PLoS Comput Biol 2012, 8(9):1002714.
  • [8]Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning. New York: Springer; 2009.
  • [9]Breiman L: Stacked regressions. Mach Learn 1996, 24(1):49-64.
  • [10]The Cancer Genome Atlas Research Network: Integrated genomic characterization of endometrial carcinoma. Nature 2013, 497(7447):67-73.
  • [11]Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, 25(16):2078-2079.
  • [12]Wolpert DH: Stacked generalization. Neural Netw 1992, 5:241-259.
  • [13]Sill J, Takács G, Mackey L, Lin D: Feature-weighted linear stacking. CoRR 2009. abs/0911.0460. [http://arxiv.org/abs/0911.0460 webcite]
  • [14]Tibshirani R: Regression shrinkage and selection via the lasso. J R Stat Soc B 1996, 58(1):267-288.
  • [15]Friedman J, Hastie T, Tibshirani R: Regularization paths for generalized linear models via coordinate descent. J Stat Softw 2010, 33(1):1-22.
  文献评价指标  
  下载次数:50次 浏览次数:29次