期刊论文详细信息
BMC Bioinformatics
Enriching for correct prediction of biological processes using a combination of diverse classifiers
Research Article
Daijin Ko1  Brad Windle2 
[1] Department of Management Science and Statistics, School of Business, University of Texas at San Antonio, San Antonio, TX, USA;UTSA Neuroscience Institute, University of Texas at San Antonio, San Antonio, TX, USA;Department of Medicinal Chemistry, School of Pharmacy, Virginia Commonwealth University, Richmond, VA, USA;Massey Cancer Center, Virginia Commonwealth University, Richmond, VA, USA;
关键词: Random Forest;    Precision Level;    Isotonic Regression;    Combine Classifier;    Precision Index;   
DOI  :  10.1186/1471-2105-12-189
 received in 2010-05-12, accepted in 2011-05-23,  发布年份 2011
来源: Springer
PDF
【 摘 要 】

BackgroundMachine learning models (classifiers) for classifying genes to biological processes each have their own unique characteristics in what genes can be classified and to what biological processes. No single learning model is qualitatively superior to any other model and overall precision for each model tends to be low. The classification results for each classifier can be complementary and synergistic suggesting the benefit of a combination of algorithms, but often the prediction probability outputs of various learning models are neither comparable nor compatible for combining. A means to compare outputs regardless of the model and data used and combine the results into an improved comprehensive model is needed.ResultsGene expression patterns from NCI's panel of 60 cell lines were used to train a Random Forest, a Support Vector Machine and a Neural Network model, plus two over-sampled models for classifying genes to biological processes. Each model produced unique characteristics in the classification results. We introduce the Precision Index measure (PIN) from the maximum posterior probability that allows assessing, comparing and combining multiple classifiers. The class specific precision measure (PIC) is introduced and used to select a subset of predictions across all classes and all classifiers with high precision. We developed a single classifier that combines the PINs from these five models in prediction and found that the PIN Combined Classifier (PINCom) significantly increased the number of correctly predicted genes over any single classifier. The PINCom applied to test genes that were not used in training also showed substantial improvement over any single model.ConclusionsThis paper introduces novel and effective ways of assessing predictions by their precision and recall plus a method that combines several machine learning models and capitalizes on synergy and complementation in class selection, resulting in higher precision and recall. Different machine learning models yielded incongruent results each of which were successfully combined into one superior model using the PIN measure we developed. Validation of the boosted predictions for gene functions showed the genes to be accurately predicted.

【 授权许可】

CC BY   
© Ko and Windle; licensee BioMed Central Ltd. 2011

【 预 览 】
附件列表
Files Size Format View
RO202311101939523ZK.pdf 545KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  文献评价指标  
  下载次数:4次 浏览次数:4次