期刊论文详细信息
BMC Bioinformatics
Predicting the recurrence of noncoding regulatory mutations in cancer
Research Article
Jung Kyoon Choi1  Woojin Yang1  Kiwon Jang1  Hyoeun Bang1  Min Kyung Sung1 
[1] Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea;
关键词: Random Forest;    Differentially Express Gene;    Recurrent Mutation;    Random Forest Classifier;    Recurrence Model;   
DOI  :  10.1186/s12859-016-1385-y
 received in 2016-06-24, accepted in 2016-11-26,  发布年份 2016
来源: Springer
PDF
【 摘 要 】

BackgroundOne of the greatest challenges in cancer genomics is to distinguish driver mutations from passenger mutations. Whereas recurrence is a hallmark of driver mutations, it is difficult to observe recurring noncoding mutations owing to a limited amount of whole-genome sequenced samples. Hence, it is required to develop a method to predict potentially recurrent mutations.ResultsIn this work, we developed a random forest classifier that predicts regulatory mutations that may recur based on the features of the mutations repeatedly appearing in a given cohort. With breast cancer as a model, we profiled 35 quantitative features describing genetic and epigenetic signals at the mutation site, transcription factors whose binding motif was disrupted by the mutation, and genes targeted by long-range chromatin interactions. A true set of mutations for machine learning was generated by interrogating publicly available pan-cancer genomes based on our statistical model of mutation recurrence. The performance of our random forest classifier was evaluated by cross validations. The variable importance of each feature in the classification of mutations was investigated. Our statistical recurrence model for the random forest classifier showed an area under the curve (AUC) of ~0.78 in predicting recurrent mutations. Chromatin accessibility at the mutation sites, the distance from the mutations to known cancer risk loci, and the role of the target genes in the regulatory or protein interaction network were among the most important variables.ConclusionsOur methods enable to characterize recurrent regulatory mutations using a limited number of whole-genome samples, and based on the characterization, to predict potential driver mutations whose recurrence is not found in the given samples but likely to be observed with additional samples.

【 授权许可】

CC BY   
© The Author(s). 2016

【 预 览 】
附件列表
Files Size Format View
RO202311102683835ZK.pdf 1229KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  文献评价指标  
  下载次数:1次 浏览次数:0次