BMC Bioinformatics | |
APP: an Automated Proteomics Pipeline for the analysis of mass spectrometry data based on multiple open access tools | |
Erik K Malm1  Vaibhav Srivastava1  Gustav Sundqvist1  Vincent Bulone1  | |
[1] Division of Glycoscience, School of Biotechnology, Royal Institute of Technology (KTH), AlbaNova University Centre, Stockholm, Sweden | |
关键词: Distributed processing; Validation; Automation; Proteomics; | |
Others : 1114576 DOI : 10.1186/s12859-014-0441-8 |
|
received in 2014-08-17, accepted in 2014-12-18, 发布年份 2014 | |
【 摘 要 】
Background
Mass spectrometry analyses of complex protein samples yield large amounts of data and specific expertise is needed for data analysis, in addition to a dedicated computer infrastructure. Furthermore, the identification of proteins and their specific properties require the use of multiple independent bioinformatics tools and several database search algorithms to process the same datasets. In order to facilitate and increase the speed of data analysis, there is a need for an integrated platform that would allow a comprehensive profiling of thousands of peptides and proteins in a single process through the simultaneous exploitation of multiple complementary algorithms.
Results
We have established a new proteomics pipeline designated as APP that fulfills these objectives using a complete series of tools freely available from open sources. APP automates the processing of proteomics tasks such as peptide identification, validation and quantitation from LC-MS/MS data and allows easy integration of many separate proteomics tools. Distributed processing is at the core of APP, allowing the processing of very large datasets using any combination of Windows/Linux physical or virtual computing resources.
Conclusions
APP provides distributed computing nodes that are simple to set up, greatly relieving the need for separate IT competence when handling large datasets. The modular nature of APP allows complex workflows to be managed and distributed, speeding up throughput and setup. Additionally, APP logs execution information on all executed tasks and generated results, simplifying information management and validation.
【 授权许可】
2014 Malm et al.; licensee BioMed Central.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150205011814716.pdf | 922KB | download | |
Figure 2. | 99KB | Image | download |
Figure 1. | 53KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
【 参考文献 】
- [1]Craig R, Beavis RC: TANDEM: matching proteins with tandem mass spectra. Bioinformatics 2004, 20:1466-1467.
- [2]Tabb D: MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J Proteome Res 2007, 6:654-661.
- [3]Geer LY, Markey SP, Kowalak JA, Wagner L, Xu M, Maynard DM, Yang X, Shi W, Bryant SH: Open mass spectrometry search algorithm. J Proteome Res 2004, 3:958-964.
- [4]Eng JK, Jahan TA, Hoopmann MR: Comet: an open-source MS/MS sequence database search tool. Proteomics 2013, 13:22-24.
- [5]Tanner S, Shu H, Frank A, Wang L, Zandi E, Mumby M, Pevzner P, Bafna V: InsPecT: identification of post-translationally modified peptides from tandem mass spectra. Anal Chem 2005, 77:4626-4639.
- [6]Lam H, Deutsch E, Eddes J, Eng J, King N, Stein S, Aebersold R: Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 2007, 7:655-667.
- [7]Kim S, Mischerikow N, Bandeira N, Navarro JD, Wich L, Mohammed S, Heck AJR, Pevzner PA: The generating function of CID, ETD, and CID/ETD pairs of tandem mass spectra: applications to database search. Mol Cell Proteom 2010, 9:2840-2852.
- [8]Keller A, Nesvizhskii A, Kolker E, Aebersold R: Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem 2002, 74:5383-5392.
- [9]Li X-J, Zhang H, Ranish JA, Aebersold R: Automated statistical analysis of protein abundance ratios from data generated by stable-isotope dilution and tandem mass spectrometry. Anal Chem 2003, 75:6648-6657.
- [10]Shteynberg D, Deutsch EW, Lam H, Eng JK, Sun Z, Tasman N, Mendoza L, Moritz RL, Aebersold R, Nesvizhskii AI: iProphet: multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates. Mol Cell Proteom 2011, 10:M111-007690.
- [11]Nesvizhskii A, Keller A, Kolker E, Aebersold R: A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem 2003, 75:4646-4658.
- [12]Creasy DM, Cottrell JS: Unimod: protein modifications for mass spectrometry. Proteomics 2004, 4:1534-1536.
- [13]Asara J, Christofk H, Freimark L, Cantley L: A label-free quantification method by MS/MS TIC compared to SILAC and spectral counting in a proteomics screen. Proteomics 2008, 8:994-999.
- [14]Kohlbacher O, Reinert K, Gröpl C, Lange E, Pfeifer N, Schulz-Trieglaff O, Sturm M: TOPP–the OpenMS proteomics pipeline. Bioinformatics 2007, 23:191-197.
- [15]Keller A, Eng J, Zhang N, Li XJ, Aebersold R: A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol Sys Biol 2005, 1:1-8.
- [16]Hartler J, Thallinger GG, Stocker G, Sturn A, Burkard TR, Körner E, Rader R, Schmidt A, Mechtler K, Trajanoski Z: MASPECTRAS: a platform for management and analysis of proteomics LC-MS/MS data. BMC Bioinformatics 2007, 8:197. BioMed Central Full Text
- [17]de Bruin JS, Deelder AM, Palmblad M: Scientific workflow management in proteomics. Mol Cell Proteom 2012, 11:M111.010595.
- [18]Srivastava V, Malm E, Sundqvist G, Bulone V: Quantitative proteomics reveals that plasma membrane microdomains from poplar cell suspension cultures are enriched in markers of signal transduction, molecular transport and callose biosynthesis. Mol Cell Proteom 2013, 12:3874-3885.
- [19]Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A, Schein J, Sterck L, Aerts A, Bhalerao RR, Bhalerao RP, Blaudez D, Boerjan W, Brun A, Brunner A, Busov V, Campbell M, Carlson J, Chalot M, Chapman J, Chen G-L, Cooper D, Coutinho PM, Couturier J, Covert S, Cronk Q, et al.: The genome of black cottonwood, Populus trichocarpa (Torr. and Gray). Science 2006, 313:1596-1604.
- [20]Kwon T, Choi H, Vogel C: MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines. J Proteome Res 2011, 10:2949-2958.
- [21]Ma ZQ, Dasari S, Chambers MC, Litton MD, Sobecki SM, Zimmerman LJ, Halvey PJ, Schilling B, Drake PM, Gibson BW, Tabb DL: IDPicker 2.0: improved protein assembly with high discrimination peptide identification filtering.J. Proteome Res 2009, 8:3872-3881.
- [22]Jeong K, Kim S, Pevzner PA: UniNovo: a universal tool for de novo peptide sequencing. Bioinformatics 2013, 29:1953-1962.
- [23]Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 2005, 21:3674-3676.