BMC Bioinformatics | |
Clustering with position-specific constraints on variance: Applying redescending M-estimators to label-free LC-MS data analysis | |
Research Article | |
D R Mani1  Saumyadipta Pyne2  Rudolf Frühwirth3  | |
[1] Broad Institute of MIT and Harvard University, Cambridge, MA, USA;Broad Institute of MIT and Harvard University, Cambridge, MA, USA;Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA;Institute of High Energy Physics, Austrian Academy of Sciences, Vienna, Austria; | |
关键词: Cluster Center; Peak Match; DEterministic Annealing; Constraint Enforcement; Landmark Match; | |
DOI : 10.1186/1471-2105-12-358 | |
received in 2011-06-08, accepted in 2011-08-31, 发布年份 2011 | |
来源: Springer | |
【 摘 要 】
BackgroundClustering is a widely applicable pattern recognition method for discovering groups of similar observations in data. While there are a large variety of clustering algorithms, very few of these can enforce constraints on the variation of attributes for data points included in a given cluster. In particular, a clustering algorithm that can limit variation within a cluster according to that cluster's position (centroid location) can produce effective and optimal results in many important applications ranging from clustering of silicon pixels or calorimeter cells in high-energy physics to label-free liquid chromatography based mass spectrometry (LC-MS) data analysis in proteomics and metabolomics.ResultsWe present MEDEA (M-Estimator with DEterministic Annealing), an M-estimator based, new unsupervised algorithm that is designed to enforce position-specific constraints on variance during the clustering process. The utility of MEDEA is demonstrated by applying it to the problem of "peak matching"--identifying the common LC-MS peaks across multiple samples--in proteomic biomarker discovery. Using real-life datasets, we show that MEDEA not only outperforms current state-of-the-art model-based clustering methods, but also results in an implementation that is significantly more efficient, and hence applicable to much larger LC-MS data sets.ConclusionsMEDEA is an effective and efficient solution to the problem of peak matching in label-free LC-MS data. The program implementing the MEDEA algorithm, including datasets, clustering results, and supplementary information is available from the author website at http://www.hephy.at/user/fru/medea/.
【 授权许可】
Unknown
© Frühwirth et al; licensee BioMed Central Ltd. 2011. This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311106581788ZK.pdf | 605KB | download |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]