BMC Bioinformatics | |
A genetic algorithm-Bayesian network approach for the analysis of metabolomics and spectroscopic data: application to the rapid identification of Bacillus spores and classification of Bacillus species | |
Methodology Article | |
Elon Correa1  Royston Goodacre2  | |
[1] School of Chemistry, The University of Manchester, 131 Princess Street, M1 7ND, Manchester, UK;School of Chemistry, The University of Manchester, 131 Princess Street, M1 7ND, Manchester, UK;Manchester Centre for Integrative Systems Biology, Manchester Interdisciplinary Biocentre, University of Manchester, 131 Princess Street, M1 7ND, Manchester, UK; | |
关键词: Genetic Algorithm; Feature Selection; Bayesian Network; Predictive Accuracy; Bacillus Species; | |
DOI : 10.1186/1471-2105-12-33 | |
received in 2010-06-10, accepted in 2011-01-26, 发布年份 2011 | |
来源: Springer | |
【 摘 要 】
BackgroundThe rapid identification of Bacillus spores and bacterial identification are paramount because of their implications in food poisoning, pathogenesis and their use as potential biowarfare agents. Many automated analytical techniques such as Curie-point pyrolysis mass spectrometry (Py-MS) have been used to identify bacterial spores giving use to large amounts of analytical data. This high number of features makes interpretation of the data extremely difficult We analysed Py-MS data from 36 different strains of aerobic endospore-forming bacteria encompassing seven different species. These bacteria were grown axenically on nutrient agar and vegetative biomass and spores were analyzed by Curie-point Py-MS.ResultsWe develop a novel genetic algorithm-Bayesian network algorithm that accurately identifies sand selects a small subset of key relevant mass spectra (biomarkers) to be further analysed. Once identified, this subset of relevant biomarkers was then used to identify Bacillus spores successfully and to identify Bacillus species via a Bayesian network model specifically built for this reduced set of features.ConclusionsThis final compact Bayesian network classification model is parsimonious, computationally fast to run and its graphical visualization allows easy interpretation of the probabilistic relationships among selected biomarkers. In addition, we compare the features selected by the genetic algorithm-Bayesian network approach with the features selected by partial least squares-discriminant analysis (PLS-DA). The classification accuracy results show that the set of features selected by the GA-BN is far superior to PLS-DA.
【 授权许可】
CC BY
© Correa and Goodacre; licensee BioMed Central Ltd. 2011
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311093678977ZK.pdf | 552KB | download |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
- [36]
- [37]
- [38]
- [39]
- [40]
- [41]
- [42]