| BMC Bioinformatics | |
| Combining calls from multiple somatic mutation-callers | |
| Su Yeon Kim1  Laurent Jacob3  Terence P Speed2  | |
| [1] Department of Statistics, University of California at Berkeley, Berkeley CA 94720, USA | |
| [2] , Walter and Eliza Hall Institute of Medical Research and the University of Melbourne, Parkville, Victoria, Australia | |
| [3] Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, INRA, UMR5558 Villeurbanne, France | |
| 关键词: Stacking; Combining calls; Somatic mutation-calling; Cancer genome; | |
| Others : 818544 DOI : 10.1186/1471-2105-15-154 |
|
| received in 2014-02-16, accepted in 2014-05-12, 发布年份 2014 | |
PDF
|
|
【 摘 要 】
Background
Accurate somatic mutation-calling is essential for insightful mutation analyses in cancer studies. Several mutation-callers are publicly available and more are likely to appear. Nonetheless, mutation-calling is still challenging and there is unlikely to be one established caller that systematically outperforms all others. Therefore, fully utilizing multiple callers can be a powerful way to construct a list of final calls for one’s research.
Results
Using a set of mutations from multiple callers that are impartially validated, we present a statistical approach for building a combined caller, which can be applied to combine calls in a wider dataset generated using a similar protocol. Using the mutation outputs and the validation data from The Cancer Genome Atlas endometrial study (6,746 sites), we demonstrate how to build a statistical model that predicts the probability of each call being a somatic mutation, based on the detection status of multiple callers and a few associated features.
Conclusion
The approach allows us to build a combined caller across the full range of stringency levels, which outperforms all of the individual callers.
【 授权许可】
2014 Kim et al.; licensee BioMed Central Ltd.
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| 20140711110617465.pdf | 352KB | ||
| Figure 3. | 79KB | Image | |
| Figure 2. | 73KB | Image | |
| Figure 1. | 32KB | Image |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
【 参考文献 】
- [1]Hansen NF, Gartner JJ, Mei L, Samuels Y, Mullikin JC: Shimmer: detection of genetic alterations in tumors using next-generation sequence data. Bioinformatics 2013, 29(12):1498-1503.
- [2]Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C, Gabriel S, Meyerson M, Lander ES, Getz G: Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol 2013, 31(3):213-219.
- [3]Saunders CT, Wong WS, Swamy S, Becq J, Murray LJ, Cheetham RK: Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 2012, 28(14):1811-1817.
- [4]Ding J, Bashashati A, Roth A, Oloumi A, Tse K, Zeng T, Haffari G, Hirst M, Marra MA, Condon A, Aparicio S, Shah SP: Feature-based classifiers for somatic mutation detection in tumour-normal paired sequencing data. Bioinformatics 2012, 28(2):167-175.
- [5]Roth A, Ding J, Morin R, Crisan A, Ha G, Giuliany R, Bashashati A, Hirst M, Turashvili G, Oloumi A, Marra MA, Aparicio S, Shah SP: JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data. Bioinformatics 2012, 28(7):907-913.
- [6]Larson DE, Harris CC, Chen K, Koboldt DC, Abbott TE, Dooling DJ, Ley TJ, Mardis ER, Wilson RK, Ding L: SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 2012, 28(3):311-317.
- [7]Lower M, Renard BY, de Graaf J, Wagner M, Paret C, Kneip C, Tureci O, Diken M, Britten C, Kreiter S, Koslowski M, Castle JC, Sahin U: Confidence-based somatic mutation evaluation and prioritization. PLoS Comput Biol 2012, 8(9):1002714.
- [8]Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning. New York: Springer; 2009.
- [9]Breiman L: Stacked regressions. Mach Learn 1996, 24(1):49-64.
- [10]The Cancer Genome Atlas Research Network: Integrated genomic characterization of endometrial carcinoma. Nature 2013, 497(7447):67-73.
- [11]Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, 25(16):2078-2079.
- [12]Wolpert DH: Stacked generalization. Neural Netw 1992, 5:241-259.
- [13]Sill J, Takács G, Mackey L, Lin D: Feature-weighted linear stacking. CoRR 2009. abs/0911.0460. [http://arxiv.org/abs/0911.0460 webcite]
- [14]Tibshirani R: Regression shrinkage and selection via the lasso. J R Stat Soc B 1996, 58(1):267-288.
- [15]Friedman J, Hastie T, Tibshirani R: Regularization paths for generalized linear models via coordinate descent. J Stat Softw 2010, 33(1):1-22.
PDF