期刊论文详细信息
BMC Bioinformatics
Lack of sufficiently strong informative features limits the potential of gene expression analysis as predictive tool for many clinical classification problems
Research Article
Takayuki Iwamoto1  Lajos Pusztai1  Yuan Qi2  Kenneth R Hess2  Caimiao Wei2  W Fraser Symmans3 
[1] Breast Medical Oncology, University of Texas MD Anderson Cancer Center Houston, Texas, USA;Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, Texas, USA;Pathology, University of Texas MD Anderson Cancer Center Houston, Texas, USA;
关键词: Fold Difference;    Informative Feature;    Informative Case;    Diagonal Linear Discriminant Analysis;    Spiked Probe;   
DOI  :  10.1186/1471-2105-12-463
 received in 2011-07-21, accepted in 2011-12-01,  发布年份 2011
来源: Springer
PDF
【 摘 要 】

BackgroundOur goal was to examine how various aspects of a gene signature influence the success of developing multi-gene prediction models. We inserted gene signatures into three real data sets by altering the expression level of existing probe sets. We varied the number of probe sets perturbed (signature size), the fold increase of mean probe set expression in perturbed compared to unperturbed data (signature strength) and the number of samples perturbed. Prediction models were trained to identify which cases had been perturbed. Performance was estimated using Monte-Carlo cross validation.ResultsSignature strength had the greatest influence on predictor performance. It was possible to develop almost perfect predictors with as few as 10 features if the fold difference in mean expression values were > 2 even when the spiked samples represented 10% of all samples. We also assessed the gene signature set size and strength for 9 real clinical prediction problems in six different breast cancer data sets.ConclusionsWe found sufficiently large and strong predictive signatures only for distinguishing ER-positive from ER-negative cancers, there were no strong signatures for more subtle prediction problems. Current statistical methods efficiently identify highly informative features in gene expression data if such features exist and accurate models can be built with as few as 10 highly informative features. Features can be considered highly informative if at least 2-fold expression difference exists between comparison groups but such features do not appear to be common for many clinically relevant prediction problems in human data sets.

【 授权许可】

Unknown   
© Hess et al; licensee BioMed Central Ltd. 2011. This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

【 预 览 】
附件列表
Files Size Format View
RO202311097462784ZK.pdf 468KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  文献评价指标  
  下载次数:8次 浏览次数:1次