期刊论文详细信息
BMC Genomics
SNooPer: a machine learning-based method for somatic variant identification from low-pass next-generation sequencing
Software
Pamela Mehanna1  Manon Ouimet1  Ramon Vidal1  Jasmine Healy1  Pauline Cassart1  Jean-François Spinella1  Virginie Saillour1  Chantal Richer1  Daniel Sinnett2 
[1] CHU Sainte-Justine Research Center, Université de Montréal, Montreal, QC, Canada;CHU Sainte-Justine Research Center, Université de Montréal, Montreal, QC, Canada;Department of Pediatrics, Faculty of Medicine, Université de Montréal, Montreal, QC, Canada;Division of Hematology-Oncology, CHU Sainte-Justine Research Center, 3175 Côte Sainte-Catherine, H3T 1C5, Montreal, QC, Canada;
关键词: Somatic variant;    Low-pass next-generation sequencing;    Machine learning;    Random Forest;   
DOI  :  10.1186/s12864-016-3281-2
 received in 2016-05-26, accepted in 2016-11-09,  发布年份 2016
来源: Springer
PDF
【 摘 要 】

BackgroundNext-generation sequencing (NGS) allows unbiased, in-depth interrogation of cancer genomes. Many somatic variant callers have been developed yet accurate ascertainment of somatic variants remains a considerable challenge as evidenced by the varying mutation call rates and low concordance among callers. Statistical model-based algorithms that are currently available perform well under ideal scenarios, such as high sequencing depth, homogeneous tumor samples, high somatic variant allele frequency (VAF), but show limited performance with sub-optimal data such as low-pass whole-exome/genome sequencing data. While the goal of any cancer sequencing project is to identify a relevant, and limited, set of somatic variants for further sequence/functional validation, the inherently complex nature of cancer genomes combined with technical issues directly related to sequencing and alignment can affect either the specificity and/or sensitivity of most callers.ResultsFor these reasons, we developed SNooPer, a versatile machine learning approach that uses Random Forest classification models to accurately call somatic variants in low-depth sequencing data. SNooPer uses a subset of variant positions from the sequencing output for which the class, true variation or sequencing error, is known to train the data-specific model. Here, using a real dataset of 40 childhood acute lymphoblastic leukemia patients, we show how the SNooPer algorithm is not affected by low coverage or low VAFs, and can be used to reduce overall sequencing costs while maintaining high specificity and sensitivity to somatic variant calling. When compared to three benchmarked somatic callers, SNooPer demonstrated the best overall performance.ConclusionsWhile the goal of any cancer sequencing project is to identify a relevant, and limited, set of somatic variants for further sequence/functional validation, the inherently complex nature of cancer genomes combined with technical issues directly related to sequencing and alignment can affect either the specificity and/or sensitivity of most callers. The flexibility of SNooPer’s random forest protects against technical bias and systematic errors, and is appealing in that it does not rely on user-defined parameters. The code and user guide can be downloaded at https://sourceforge.net/projects/snooper/.

【 授权许可】

CC BY   
© The Author(s). 2016

【 预 览 】
附件列表
Files Size Format View
RO202311101128120ZK.pdf 1386KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  • [44]
  • [45]
  • [46]
  • [47]
  文献评价指标  
  下载次数:1次 浏览次数:0次