期刊论文详细信息
BMC Bioinformatics
A Bayesian network approach to feature selection in mass spectrometry data
Methodology Article
OJ Semmes1  Lisa H Cazares1  Dariya I Malyarenko2  Eugene R Tracy2  William E Cooke2  Karl W Kuschner2 
[1] Center for Biomedical Proteomics, Eastern Virginia Medical School, Norfolk, VA, USA;Department of Physics, The College of William and Mary, Williamsburg, VA, USA;
关键词: Mutual Information;    Bayesian Network;    Feature Selection Method;    Bayesian Belief Network;    Disease Class;   
DOI  :  10.1186/1471-2105-11-177
 received in 2009-09-01, accepted in 2010-04-08,  发布年份 2010
来源: Springer
PDF
【 摘 要 】

BackgroundTime-of-flight mass spectrometry (TOF-MS) has the potential to provide non-invasive, high-throughput screening for cancers and other serious diseases via detection of protein biomarkers in blood or other accessible biologic samples. Unfortunately, this potential has largely been unrealized to date due to the high variability of measurements, uncertainties in the distribution of proteins in a given population, and the difficulty of extracting repeatable diagnostic markers using current statistical tools. With studies consisting of perhaps only dozens of samples, and possibly hundreds of variables, overfitting is a serious complication. To overcome these difficulties, we have developed a Bayesian inductive method which uses model-independent methods of discovering relationships between spectral features. This method appears to efficiently discover network models which not only identify connections between the disease and key features, but also organizes relationships between features--and furthermore creates a stable classifier that categorizes new data at predicted error rates.ResultsThe method was applied to artificial data with known feature relationships and typical TOF-MS variability introduced, and was able to recover those relationships nearly perfectly. It was also applied to blood sera data from a 2004 leukemia study, and showed high stability of selected features under cross-validation. Verification of results using withheld data showed excellent predictive power. The method showed improvement over traditional techniques, and naturally incorporated measurement uncertainties. The relationships discovered between features allowed preliminary identification of a protein biomarker which was consistent with other cancer studies and later verified experimentally.ConclusionsThis method appears to avoid overfitting in biologic data and produce stable feature sets in a network model. The network structure provides additional information about the relationships among features that is useful to guide further biochemical analysis. In addition, when used to classify new data, these feature sets are far more consistent than those produced by many traditional techniques.

【 授权许可】

CC BY   
© Kuschner et al; licensee BioMed Central Ltd. 2010

【 预 览 】
附件列表
Files Size Format View
RO202311091862492ZK.pdf 353KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  文献评价指标  
  下载次数:2次 浏览次数:1次