| Journal of Clinical Bioinformatics | |
| Availability of MudPIT data for classification of biological samples | |
| Pierluigi Mauri2  Giancarlo Mauri1  Valeria Bellettato2  Francesca Brambilla2  Italo Zoppis1  Dario Di Silvestre2  | |
| [1] Department of Informatics, Systems and Communication, Viale Sarca 336, University of Milano-Bicocca, Milan, Italy;, Institute for Biomedical Technologies (ITB-CNR), via F.lli Cervi 93, Segrate (Milan), Italy | |
| 关键词: Label-free quantification; Clinical proteomics; SVM; MudPIT; Sample classification; | |
| Others : 804215 DOI : 10.1186/2043-9113-3-1 |
|
| received in 2012-10-02, accepted in 2013-01-07, 发布年份 2013 | |
PDF
|
|
【 摘 要 】
Background
Mass spectrometry is an important analytical tool for clinical proteomics. Primarily employed for biomarker discovery, it is increasingly used for developing methods which may help to provide unambiguous diagnosis of biological samples. In this context, we investigated the classification of phenotypes by applying support vector machine (SVM) on experimental data obtained by MudPIT approach. In particular, we compared the performance capabilities of SVM by using two independent collection of complex samples and different data-types, such as mass spectra (m/z), peptides and proteins.
Results
Globally, protein and peptide data allowed a better discriminant informative content than experimental mass spectra (overall accuracy higher than 87% in both collection 1 and 2). These results indicate that sequencing of peptides and proteins reduces the experimental noise affecting the raw mass spectra, and allows the extraction of more informative features available for the effective classification of samples. In addition, proteins and peptides features selected by SVM matched for 80% with the differentially expressed proteins identified by the MAProMa software.
Conclusions
These findings confirm the availability of the most label-free quantitative methods based on processing of spectral count and SEQUEST-based SCORE values. On the other hand, it stresses the usefulness of MudPIT data for a correct grouping of sample phenotypes, by applying both supervised and unsupervised learning algorithms. This capacity permit the evaluation of actual samples and it is a good starting point to translate proteomic methodology to clinical application.
【 授权许可】
2013 Di Silvestre et al.; licensee BioMed Central Ltd.
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| 20140708054812235.pdf | 1459KB | ||
| Figure 4. | 46KB | Image | |
| Figure 3. | 55KB | Image | |
| Figure 2. | 24KB | Image | |
| Figure 1. | 70KB | Image |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
【 参考文献 】
- [1]Palmblad M, Tiss A, Cramer R: Mass spectrometry in clinical proteomics - from the present to the future. Proteomics Clin Appl 2009, 3:6-17. [ http://dx.doi.org/10.1002/prca.200800090 webcite]
- [2]Ressom HW, Varghese RS, Zhang Z, Xuan J, Clarke R: Classification algorithms for phenotype prediction in genomics and proteomics. Front Biosci 2008, 13:691-708.
- [3]Frenzel J, Gessner C, Sandvoss T, Hammerschmidt S, Schellenberger W, Sack U, Eschrich K, Wirtz H: Outcome prediction in pneumonia induced ALI/ARDS by clinical features and peptide patterns of BALF determined by mass spectrometry. PLoS One 2011, 6(10):e25544. [ http://dx.doi.org/10.1371/journal.pone.0025544 webcite]
- [4]Sampson DL, Parker TJ, Upton Z, Hurst CP: A comparison of methods for classifying clinical samples based on proteomics data: a case study for statistical and machine learning approaches. PLoS One 2011, 6(9):e24973. [ http://dx.doi.org/10.1371/journal.pone.0024973 webcite]
- [5]Waloszczyk P, Janus T, Alchimowicz J, Grodzki T, Borowiak K: Proteomic patterns analysis with multivariate calculations as a promising tool for prompt differentiation of early stage lung tissue with cancer and unchanged tissue material. Diagn Pathol 2011, 6:22. [ http://dx.doi.org/10.1186/1746-1596-6-22 webcite] BioMed Central Full Text
- [6]Rajalahti T, Kroksveen AC, Arneberg R, Berven FS, Vedeler CA, Myhr KM, Kvalheim OM: A multivariate approach to reveal biomarker signatures for disease classification: application to mass spectral profiles of cerebrospinal fluid from patients with multiple sclerosis. J Proteome Res 2010, 9(7):3608-3620. [ http://dx.doi.org/10.1021/pr100142m webcite]
- [7]Camaggi CM, Zavatto E, Gramantieri L, Camaggi V, Strocchi E, Righini R, Merina L, Chieco P, Bolondi L: Serum albumin-bound proteomic signature for early detection and staging of hepatocarcinoma: sample variability and data classification. Clin Chem Lab Med 2010, 48(9):1319-1326. [ http://dx.doi.org/10.1515/CCLM.2010.248 webcite]
- [8]Kim HK, Reyzer ML, Choi IJ, Kim CG, Kim HS, Oshima A, Chertov O, Colantonio S, Fisher RJ, Allen JL, Caprioli RM, Green JE: Gastric cancer-specific protein profile identified using endoscopic biopsy samples via MALDI mass spectrometry. J Proteome Res 2010, 9(8):4123-4130. [ http://dx.doi.org/10.1021/pr100302b webcite]
- [9]Balog CIA, Alexandrov T, Derks RJ, Hensbergen PJ, van Dam GJ, Tukahebwa EM, Kabatereine NB, Thiele H, Vennervald BJ, Mayboroda OA, Deelder AM: The feasibility of MS and advanced data processing for monitoring Schistosoma mansoni infection. Proteomics Clin Appl 2010, 4(5):499-510. [ http://dx.doi.org/10.1002/prca.200900158 webcite]
- [10]Chinello C, Gianazza E, Zoppis I, Mainini V, Galbusera C, Picozzi S, Rocco F, Galasso G, Bosari S, Ferrero S, Perego R, Raimondo F, Bianchi C, Pitto M, Signorini S, Brambilla P, Mocarelli P, Galli Kienle M, Magni F: Serum biomarkers of renal cell carcinoma assessed using a protein profiling approach based on clinProt technique. Urology 2010, 75(4):842-847.
- [11]Lin Q, Peng Q, Yao F, Pan XF, Xiong LW, Wang Y, Geng JF, Feng JX, Han BH, Bao GL, Yang Y, Wang X, Jin L, Guo W, Wang JC: A classification method based on principal components of SELDI spectra to diagnose of lung adenocarcinoma. PLoS One 2012, 7(3):e34457. [ http://dx.doi.org/10.1371/journal.pone.0034457 webcite]
- [12]Fan Y, Wang J, Yang Y, Liu Q, Fan Y, Yu J, Zheng S, Li M, Wang J: Detection and identification of potential biomarkers of breast cancer. J Cancer Res Clin Oncol 2010, 136(8):1243-1254. [ http://dx.doi.org/10.1007/s00432-010-0775-1 webcite]
- [13]Tang KL, Li TH, Xiong WW, Chen K: Ovarian cancer classification based on dimensionality reduction for SELDI-TOF data. BMC Bioinformatics 2010, 11:109. [ http://dx.doi.org/10.1186/1471-2105-11-109 webcite] BioMed Central Full Text
- [14]Van Gorp T, Cadron I, Daemen A, De Moor B, Waelkens E, Vergote I: Proteomic biomarkers predicting lymph node involvement in serum of cervical cancer patients. Limitations of SELDI-TOF MS. Proteome Sci 2012, 10:41. [ http://dx.doi.org/10.1186/1477-5956-10-41 webcite] BioMed Central Full Text
- [15]Wiesner C, Hannum C, Reckamp K, Figlin R, Dubridge R, Roy SM, Lin S, Becker CH, Jones T, Hiller J, Cheville JC, Wilson K: Consistency of a two clinical site sample collection: a proteomics study. Proteomics Clin Appl 2010, 4(8-9):726-738. [ http://dx.doi.org/10.1002/prca.200900206 webcite]
- [16]Gambin A, Szczurek E, Dutkowski J, Bakun M, Dadlez M: Classification of peptide mass fingerprint data by novel no-regret boosting method. Comput Biol Med 2009, 39(5):460-473. [ http://dx.doi.org/10.1016/j.compbiomed.2009.03.006 webcite]
- [17]Perez-Riverol Y, Audain E, Millan A, Ramos Y, Sanchez A, Vizcaino JA, Wang R, Mller M, Machado YJ, Betancourt LH, González LJ, Padrn G, Besada V: Isoelectric point optimization using peptide descriptors and support vector machines. J Proteomics 2012, 75(7):2269-2274. [ http://dx.doi.org/10.1016/j.jprot.2012.01.029 webcite]
- [18]Ding J, Shi J, Wu FX: SVM-RFE based feature selection for tandem mass spectrum quality assessment. Int J Data Min Bioinform 2011, 5:73-88.
- [19]Webb-Robertson BJM: Support vector machines for improved peptide identification from tandem mass spectrometry database search. Methods Mol Biol 2009, 492:453-460. [ http://dx.doi.org/10.1007/978-1-59745-493-3_28 webcite]
- [20]Baczek T, Kaliszan R: Predictions of peptides’ retention times in reversed-phase liquid chromatography as a new supportive tool to improve protein identification in proteomics. Proteomics 2009, 9(4):835-847. [ http://dx.doi.org/10.1002/pmic.200800544 webcite]
- [21]Gaspari M, Verhoeckx KCM, Verheij ER, van der Greef J: Integration of two-dimensional LC-MS with multivariate statistics for comparative analysis of proteomic samples. Anal Chem 2006, 78(7):2286-2296. [ http://dx.doi.org/10.1021/ac052000t webcite]
- [22]Sodek KL, Evangelou AI, Ignatchenko A, Agochiya M, Brown TJ, Ringuette MJ, Jurisica I, Kislinger T: Identification of pathways associated with invasive behavior by ovarian cancer cells using multidimensional protein identification technology (MudPIT). Mol Biosyst 2008, 4(7):762-773. [ http://dx.doi.org/10.1039/b717542f webcite]
- [23]Simioniuc A, Campan M, Lionetti V, Marinelli M, Aquaro GD, Cavallini C, Valente S, Di Silvestre D, Cantoni S, Bernini F, Simi C, Pardini S, Mauri P, Neglia D, Ventura C, Pasquinelli G, Recchia FA: Placental stem cells pre-treated with a hyaluronan mixed ester of butyric and retinoic acid to cure infarcted pig hearts: a multimodal study. Cardiovasc Res 2011, 90(3):546-556. [ http://dx.doi.org/10.1093/cvr/cvr018 webcite]
- [24]Mauri P, Scigelova M: Multidimensional protein identification technology for clinical proteomic analysis. Clin Chem Lab Med 2009, 47(6):636-646. [ http://dx.doi.org/10.1515/CCLM.2009.165 webcite]
- [25]Yates JR, Ruse CI, Nakorchevsky A: Proteomics by mass spectrometry: approaches, advances, and applications. Annu Rev Biomed Eng 2009, 11:49-79. [ http://dx.doi.org/10.1146/annurev-bioeng-061008-124934 webcite]
- [26]Mauri P, Scarpa A, Nascimbeni AC, Benazzi L, Parmagnani E, Mafficini A, Della Peruta M, Bassi C, Miyazaki K, Sorio C: Identification of proteins released by pancreatic cancer cells by multidimensional protein identification technology: a strategy for identification of novel cancer markers. FASEB J 2005, 19(9):1125-1127. [ http://dx.doi.org/10.1096/fj.04-3000fje webcite]
- [27]Park SK, Venable JD, Xu T, Yates JR 3rd: A quantitative analysis software tool for mass spectrometry-based proteomics. Nat Methods 2008, 5(4):319-322. [ http://dx.doi.org/10.1038/nmeth.1195 webcite]
- [28]Bergamini G, Di Silvestre D, Mauri P, Cigana C, Bragonzi A, De Palma A, Benazzi L, Døring G, Assael BM, Melotti P, Sorio C: MudPIT analysis of released proteins in Pseudomonas aeruginosa laboratory and clinical strains in relation to pro-inflammatory effects. Integr Biol (Camb) 2012, 4(3):270-279. [ http://dx.doi.org/10.1039/c2ib00127f webcite]
- [29]Guyon I, Gunn S, Nikravesh M, Zadeh LA (Ed): Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing). Berlin Heidelberg: Springer-Verlag; 2006. (ISBN 978-3-540-35488-8)
- [30]Cristianini N, Schawe-Taylor J: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. New York, NY, USA: Cambridge University Press; 2000. (ISBN 9780521780193)
- [31]Mauri P, Dehı G: A proteomic approach to the analysis of RNA degradosome composition in Escherichia coli. Methods Enzymol 2008, 447:99-117. [ http://dx.doi.org/10.1016/S0076-6879(08)02206-4 webcite]
- [32]Brambilla F, Lavatelli F, Di Silvestre D, Valentini V, Rossi R, Palladini G, Obici L, Verga L, Mauri P, Merlini G: Reliable typing of systemic amyloidoses through proteomic analysis of subcutaneous adipose tissue. Blood 2012, 119(8):1844-1847. [ http://dx.doi.org/10.1182/blood-2011-07-365510 webcite]
- [33]Pluskal T, Castillo S, Villar-Briones A, Oresic M: MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinformatics 2010, 11:395. [ http://dx.doi.org/10.1186/1471-2105-11-395 webcite] BioMed Central Full Text
- [34]Ducret A, Van Oostveen I, Eng JK, Yates 3rd J, Aebersold R: High throughput protein characterization by automated reverse-phase chromatography/electrospray tandem mass spectrometry. Protein Sci. 1998, 7(3):706-719. [ http://dx.doi.org/10.1002/pro.5560070320 webcite]
- [35]Di Silvestre D, Daminelli S, Brunetti P, Mauri PL: Bioinformatics Tools for Mass Spectrometry-Based Proteomics Analysis. Reviews in Pharmaceutical and Biomedical Analysis - BENTHAM SCIENCE PUBLISHERS; 2010. 3:30–52. (ISBN 978-1-60805-190-8)
- [36]Zhang B, VerBerkmoes NC, Langston MA, Uberbacher E, Hettich RL, Samatova NF: Detecting differential and correlated protein expression in label-free shotgun proteomics. J Proteome Res 2006, 5(11):2909-2918. [ http://dx.doi.org/10.1021/pr0600273 webcite]
- [37]Mitchell T: Machine Learning. New York, NY, USA: McGraw-Hill, Inc.; 1997.
- [38]Jackson JE: A Users’ Guide to Principal Components. New York: Wiley; 1991.
- [39]Arneberg R, Rajalahti T, Flikka K, Berven FS, Kroksveen AC, Berle M, Myhr KM, Vedeler CA, Ulvik RJ, Kvalheim OM: Pretreatment of mass spectral profiles: application to proteomic data. Anal Chem 2007, 79(18):7014-7026. [ http://dx.doi.org/10.1021/ac070946s webcite]
- [40]Zoppis I, Gianazza E, Borsani M, Chinello C, Mainini V, Galbusera C, Ferrarese C, Galimberti G, Sorbi S, Borroni B, Magni F, Antoniotti M, Mauri G: Mutual information optimization for mass spectra data alignment. IEEE/ACM Trans Comput Biol Bioinformatics 2012, 9(3):934-939.
- [41]Liu H, Sadygov RG, Yates 3rd JR: A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal Chem 2004, 76(14):4193-4201. [ http://dx.doi.org/10.1021/ac0498563 webcite]
- [42]Albrethsen J, Bøgebo R, Møller CH, Olsen JA, Raskov HH, Gammeltoft S: Candidate biomarker verification: Critical examination of a serum protein pattern for human colorectal cancer. Proteomics Clin Appl 2012, 6(3-4):182-189. [ http://dx.doi.org/10.1002/prca.201100095 webcite]
- [43]Gallien S, Duriez E, Crone C, Kellmann M, Moehring T, Domon B: Targeted proteomic quantification on quadrupole-orbitrap mass spectrometer. Mol Cell Proteomics 2012, 11(12):1709-1723. [ http://dx.doi.org/10.1074/mcp.O112.019802 webcite]
- [44]Bern M, Finney G, Hoopmann MR, Merrihew G, Toth MJ, MacCoss MJ: Deconvolution of mixture spectra from ion-trap data-independent-acquisition tandem mass spectrometry. Anal Chem 2010, 82(3):833-841. [ http://dx.doi.org/10.1021/ac901801b webcite]
- [45]Gillet LC, Navarro P, Tate S, Røst H, Selevsek N, Reiter L, Bonner R, Aebersold R: Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol Cell Proteomics 2012, 11(6):O111.016717. [ http://dx.doi.org/10.1074/mcp.O111.016717 webcite]
PDF