BMC Bioinformatics | |
Stronger findings from mass spectral data through multi-peak modeling | |
Tommi Suvitaival3  Simon Rogers2  Samuel Kaski1  | |
[1] Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland | |
[2] School of Computing Science, University of Glasgow, G12 8QQ, Glasgow, UK | |
[3] Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, 00076 Espoo, Finland | |
关键词: Nonparametric Bayes; Lipidomics; Metabolomics; Mass spectrometry; Clustering; Bayesian modeling; ANOVA-type modeling; | |
Others : 818395 DOI : 10.1186/1471-2105-15-208 |
|
received in 2014-03-19, accepted in 2014-06-12, 发布年份 2014 | |
【 摘 要 】
Background
Mass spectrometry-based metabolomic analysis depends upon the identification of spectral peaks by their mass and retention time. Statistical analysis that follows the identification currently relies on one main peak of each compound. However, a compound present in the sample typically produces several spectral peaks due to its isotopic properties and the ionization process of the mass spectrometer device. In this work, we investigate the extent to which these additional peaks can be used to increase the statistical strength of differential analysis.
Results
We present a Bayesian approach for integrating data of multiple detected peaks that come from one compound. We demonstrate the approach through a simulated experiment and validate it on ultra performance liquid chromatography-mass spectrometry (UPLC-MS) experiments for metabolomics and lipidomics. Peaks that are likely to be associated with one compound can be clustered by the similarity of their chromatographic shape. Changes of concentration between sample groups can be inferred more accurately when multiple peaks are available.
Conclusions
When the sample-size is limited, the proposed multi-peak approach improves the accuracy at inferring covariate effects. An R implementation and data are available at http://research.ics.aalto.fi/mi/software/peakANOVA/ webcite.
【 授权许可】
2014 Suvitaival et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20140711100831341.pdf | 1056KB | download | |
Figure 9. | 78KB | Image | download |
Figure 8. | 73KB | Image | download |
Figure 7. | 70KB | Image | download |
Figure 6. | 47KB | Image | download |
Figure 5. | 52KB | Image | download |
Figure 4. | 78KB | Image | download |
Figure 3. | 45KB | Image | download |
Figure 2. | 28KB | Image | download |
Figure 1. | 45KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
Figure 7.
Figure 8.
Figure 9.
【 参考文献 】
- [1]Shevchenko A, Simons K: Lipidomics: coming to grips with lipid diversity. Nat Rev Mol Cell Bio 2010, 11(8):593-598.
- [2]Scalbert A, Brennan L, Fiehn O, Hankemeier T, Kristal BS, van Ommen B, Pujos-Guillot E, Verheij E, Wishart D, Wopereis S: Mass-spectrometrybased metabolomics: limitations and recommendations for future progress with particular focus on nutrition research. Metabolomics 2009, 5(4):435-458.
- [3]Orešič M, Hänninen VA, Vidal-Puig A: Lipidomics: a new window to biomedical frontiers. Trends Biotechnol 2008, 26(12):647-652.
- [4]Dunn WB, Ellis DI: Metabolomics: current analytical platforms and methodologies. TrAC-Trend Anal Chem 2005, 24(4):285-294.
- [5]Windig W, Phalp JM, Payne AW: A noise and background reduction method for component detection in liquid chromatography/mass spectrometry. Anal Chem 1996, 68(20):3602-3606.
- [6]Smith CA, Want EJ, O’Maille G, Abagyan R, Siuzdak G: XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal Chem 2006, 78(3):779-787.
- [7]Huang N, Siegel MM, Kruppa GH, Laukien FH: Automation of a Fourier transform ion cyclotron resonance mass spectrometer for acquisition, analysis, and e-mailing of high-resolution exact-mass electrospray ionization mass spectral data. J Am Soc Mass Spectr 1999, 10(11):1166-1173.
- [8]Kind T, Fiehn O: Metabolomic database annotations via query of elemental compositions: mass accuracy is insufficient even at less than 1 ppm. BMC Bioinformatics 2006, 7:234. BioMed Central Full Text
- [9]Böcker S, Letzel MC, Lipták Z, Pervukhin A: SIRIUS: decomposing isotope patterns for metabolite identification. Bioinformatics 2009, 25(2):218-224.
- [10]Steuer R: Review: on the analysis and interpretation of correlations in metabolomic data. Brief Bioinform 2006, 7(2):151-158.
- [11]Heinonen M, Shen H, Zamboni N, Rousu J: Metabolite identification and molecular fingerprint prediction through machine learning. Bioinformatics 2012, 28(18):2333-2341.
- [12]Boccard J, Kalousis A, Hilario M, Lantéri P, Hanafi M, Mazerolles G, Wolfender JL, Carrupt PA, Rudaz S: Standard machine learning algorithms applied to UPLC-TOF/MS metabolic fingerprinting for the discovery of wound biomarkers in Arabidopsis thaliana. Chemometr Intell Lab 2010, 104:20-27.
- [13]Huopaniemi I, Suvitaival T, Nikkilä J, Orešič M, Kaski S: Two-way analysis of high-dimensional collinear data. Data Min Knowl Disc 2009, 19(2):261-276.
- [14]Rogers S, Daly R, Breitling R: Mixture model clustering for peak filtering in metabolomics. In Ninth International Workshop on Computational Systems Biology, WCSB 2012, June 4-6, 2012, Ulm, Germany, no. 61 in TICSP series. Edited by Larjo A, Schober S, Farhan M, Bossert M, Yli-Harja O. Tampere University of Technology: Tampere; 2012:71-74. [http://www.cs.tut.fi/wcsb12/WCSB2012.pdf webcite]
- [15]Pluskal T, Castillo S, Villar-Briones A, Orešič M: MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinformatics 2010, 11:395. BioMed Central Full Text
- [16]Escobar MD: Estimating normal means with a dirichlet process prior. J Am Stat Assoc 1994, 425:268-277. [http://www.jstor.org/stable/2291223 webcite]
- [17]Mitchell TJ, Beauchamp JJ: Bayesian variable selection in linear regression. J Am Stat Assoc 1988, 83(404):1023-1032.
- [18]Dahl DB: Bayesian Inference for Gene Expression and Proteomics. Cambridge: Cambridge University Press; 2006. Chap. Model-based clustering for expression data via a Dirichlet process mixture model, :201–218, [http://www.ddahl.org/papers/dahl-2006.pdf webcite]
- [19]Huopaniemi I, Suvitaival T, Orešič M, Kaski S: Graphical multi-way models. In Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2010, September 20–24, 2010, Barcelona, Spain, Volume 6321 of Lecture Notes in Computer Science. Edited by Balcázar JL, Bonchi F, Gionis A, Sebag M. Berlin/Heidelberg: Springer; 2010:538-553.
- [20]Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc B Met 1995, 57:289-300. [http://www.jstor.org/stable/2346101 webcite]
- [21]Vinh N, Epps J, Bailey J: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J Mach Learn Res 2010, 11:2837-2854. [http://dl.acm.org/citation.cfm?id=1953011.1953024 webcite]
- [22]Franceschi P, Masuero D, Vrhovsek U, Mattivi F, Wehrens R: A benchmark spike-in data set for biomarker identification in metabolomics. J Chemometr 2012, 26(1–2):16-24.
- [23]Franceschi P, Masuero D, Vrhovsek U, Mattivi F, Wehrens R: Spiked apple data. [http://cri.fmach.eu/Research/Computational-Biology/Biostatistics-and-Data-Management/download/data/Spiked-Apple-Data webcite] Accessed 11.06.2013.
- [24]Hilvo M, Denkert C, Lehtinen L, Müller B, Brockmöller S, Seppänen-Laakso T, Budczies J, Bucher E, Yetukuri L, Castillo S, Berg E, Nygren H, Sysi-Aho M, Griffin J, Fiehn O, Loibl S, Richter-Ehrenstein C, Radke C, Hyötyläinen T, Kallioniemi O, Iljin K, Orešič M: Novel theranostic opportunities offered by characterization of altered membrane lipid metabolism in breast cancer progression. Cancer Res 2011, 71(9):3236-3245.