BMC Bioinformatics | |
Scientific workflow optimization for improved peptide and protein identification | |
Sonja Holl2  Yassene Mohammed1  Olav Zimmermann2  Magnus Palmblad1  | |
[1] Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, 2300, RC, The Netherlands | |
[2] Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich, Jülich, 52425, Germany | |
关键词: X!Tandem; Tandem mass spectrometry; Scientific workflow; Optimization; Taverna workbench; | |
Others : 1229468 DOI : 10.1186/s12859-015-0714-x |
|
received in 2015-03-17, accepted in 2015-08-24, 发布年份 2015 | |
【 摘 要 】
Background
Peptide-spectrum matching is a common step in most data processing workflows for mass spectrometry-based proteomics. Many algorithms and software packages, both free and commercial, have been developed to address this task. However, these algorithms typically require the user to select instrument- and sample-dependent parameters, such as mass measurement error tolerances and number of missed enzymatic cleavages. In order to select the best algorithm and parameter set for a particular dataset, in-depth knowledge about the data as well as the algorithms themselves is needed. Most researchers therefore tend to use default parameters, which are not necessarily optimal.
Results
We have applied a new optimization framework for the Taverna scientific workflow management system (http://ms-utils.org/Taverna_Optimization.pdf) to find the best combination of parameters for a given scientific workflow to perform peptide-spectrum matching. The optimizations themselves are non-trivial, as demonstrated by several phenomena that can be observed when allowing for larger mass measurement errors in sequence database searches. On-the-fly parameter optimization embedded in scientific workflow management systems enables experts and non-experts alike to extract the maximum amount of information from the data. The same workflows could be used for exploring the parameter space and compare algorithms, not only for peptide-spectrum matching, but also for other tasks, such as retention time prediction.
Conclusion
Using the optimization framework, we were able to learn about how the data was acquired as well as the explored algorithms. We observed a phenomenon identifying many ammonia-loss b-ion spectra as peptides with N-terminal pyroglutamate and a large precursor mass measurement error. These insights could only be gained with the extension of the common range for the mass measurement error tolerance parameters explored by the optimization framework.
【 授权许可】
2015 Holl et al.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20151030015651365.pdf | 2612KB | download | |
Fig. 6. | 83KB | Image | download |
Fig. 5. | 75KB | Image | download |
Fig. 4. | 55KB | Image | download |
Fig. 3. | 50KB | Image | download |
Fig. 2. | 38KB | Image | download |
Fig. 1. | 50KB | Image | download |
【 图 表 】
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
【 参考文献 】
- [1]Han X, Aslanian A, Yates JR. Mass spectrometry for proteomics. Curr Opin Chem Biol. 2008; 12(5):483-90.
- [2]Xu H, Wang L, Sallans L, Freitas MA. A hierarchical MS2/MS3 database search algorithm for automated analysis of phosphopeptide tandem mass spectra. Proteomics. 2009; 9(7):1763-70.
- [3]Hernandez P, Muller M, Appel RD. Automated protein identification by tandem mass spectrometry: issues and strategies. Mass Spectrom Rev. 2006; 25(2):235-54.
- [4]Craig R, Cortens JC, Fenyo D, Beavis RC. Using annotated peptide mass spectrum libraries for protein identification. J Proteome Res. 2006; 5(8):1843-9.
- [5]Frewen BE, Merrihew GE, Wu CC, Noble WS, MacCoss MJ. Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries. Anal Chem. 2006; 78(16):5678-84.
- [6]Lam H, Aebersold R. Spectral library searching for peptide identification via tandem MS. Methods Mol Biol. 2010; 604:95-103.
- [7]Seidler J, Zinn N, Boehm ME, Lehmann WD. De novo sequencing of peptides by MS/MS. Proteomics. 2010; 10(4):634-49.
- [8]Wells JM, McLuckey SA. Collision-induced dissociation (CID) of peptides and proteins. Methods Enzymol. 2005; 402:148-85.
- [9]Syka JE, Coon JJ, Schroeder MJ, Shabanowitz J, Hunt DF. Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry. Proc Natl Acad Sci U S A. 2004; 101(26):9528-33.
- [10]Nielsen ML, Savitski MM, Zubarev RA. Improving protein identification using complementary fragmentation techniques in fourier transform mass spectrometry. Mol Cell Proteomics. 2005; 4(6):835-45.
- [11]Kim S, Mischerikow N, Bandeira N, Navarro JD, Wich L, Mohammed S et al.. The generating function of CID, ETD, and CID/ETD pairs of tandem mass spectra: applications to database search. Mol Cell Proteomics. 2010; 9(12):2840-52.
- [12]Searle BC, Turner M, Nesvizhskii AI. Improving sensitivity by probabilistically combining results from multiple MS/MS search methodologies. J Proteome Res. 2008; 7(1):245-53.
- [13]Kwon T, Choi H, Vogel C, Nesvizhskii AI, Marcotte EM. MSblender: A probabilistic approach for integrating peptide identifications from multiple database search engines. J Proteome Res. 2011; 10(7):2949-58.
- [14]Wedge DC, Krishna R, Blackhurst P, Siepen JA, Jones AR, Hubbard SJ. FDRAnalysis: a tool for the integrated analysis of tandem mass spectrometry identification results from multiple search engines. J Proteome Res. 2011; 10(4):2088-94.
- [15]de Bruin JS, Deelder AM, Palmblad M. Scientific workflow management in proteomics. Mol Cell Proteomics. 2012; 11(7):M111.
- [16]Mohammed Y, Mostovenko E, Henneman AA, Marissen RJ, Deelder AM, Palmblad M. Cloud parallel processing of tandem mass spectrometry based proteomics data. J Proteome Res. 2012; 11(10):5101-8.
- [17]Littauer R, Ram K, Ludäscher B, Michener W, Koskela R. Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practice. Int J Digit Curation. 2012; 7(2):92-100.
- [18]Piehowski PD, Petyuk VA, Sandoval JD, Burnum KE, Kiebel GR, Monroe ME et al.. STEPS: a grid search methodology for optimized peptide identification filtering of MS/MS database search results. Proteomics. 2013; 13(5):766-70.
- [19]Holl S, Zimmermann O, Hofmann-Apitius M, editors. A new optimization phase for scientific workflow management systems. 2012 IEEE 8th International Conference on E-Science (e-Science). Washington DC: IEEE Computer Society; 2012 8-12 Oct. 2012.
- [20]Vizcaino JA, Cote RG, Csordas A, Dianes JA, Fabregat A, Foster JM et al. The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013. Nucleic Acids Res. 2013;41(Database issue). doi:10.1093/nar/gks1262.
- [21]Mostovenko E, Deelder AM, Palmblad M. Protein expression dynamics during Escherichia coli glucose-lactose diauxie. BMC Microbiol. 2011; 11:126. BioMed Central Full Text
- [22]Arike L, Valgepea K, Peil L, Nahku R, Adamberg K, Vilu R. Comparison and applications of label-free absolute proteome quantification methods on Escherichia coli. J Proteomics. 2012; 75(17):5437-48.
- [23]Lichti CF, Liu H, Shavkunov AS, Mostovenko E, Sulman EP, Ezhilarasan R et al.. Integrated chromosome 19 transcriptomic and proteomic data sets derived from glioma cancer stem-cell lines. J Proteome Res. 2014; 13(1):191-9.
- [24]Yamana R, Iwasaki M, Wakabayashi M, Nakagawa M, Yamanaka S, Ishihama Y. Rapid and deep profiling of human induced pluripotent stem cell proteome by one-shot NanoLC-MS/MS analysis with meter-scale monolithic silica columns. J Proteome Res. 2013; 12(1):214-21.
- [25]Craig R, Beavis RC. TANDEM: matching proteins with tandem mass spectra. Bioinformatics. 2004; 20(9):1466-7.
- [26]Junker J, Bielow C, Bertsch A, Sturm M, Reinert K, Kohlbacher O. TOPPAS: a graphical workflow editor for the analysis of high-throughput proteomics data. J Proteome Res. 2012; 11(7):3914-20.
- [27]Lam H, Deutsch EW, Eddes JS, Eng JK, King N, Stein SE et al.. Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics. 2007; 7(5):655-67.
- [28]Holl S, Zimmermann O, Hofmann-Apitius M, editors. A UNICORE Plugin for HPC-Enabled Scientific Workflows in Taverna 2.2. 2011 IEEE World Congress on Services (SERVICES). Washington DC: IEEE Computer Society; 2011 4-9 July 2011.
- [29]Alba E, Luque G, Nesmachnow S. Parallel metaheuristics: recent advances and new trends. Int Trans Oper Res. 2013; 20(1):1-48.
- [30]Holland JH. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. M.I.T.P. Washington DC: IEEE Computer Society. 1992.
- [31]Niazi A, Leardi R. Genetic algorithms in chemometrics. J Chemometr. 2012; 26(6):345-51.
- [32]Meffert K. JGAP - Java Genetic Algorithms Package. 2013. http://jgap. sourceforge.net/ webcite
- [33]Holl S. Automated Optimization Methods for Scientific Workflows in e-Science Infrastructures. Forschungszentrum Jülich; 2014. http://juser. fz-juelich.de/record/153150 webcite
- [34]Streit A, Bala P, Beck-Ratzka Aea. UNICORE 6 – Recent and Future Advancements. Report. Jülich: Forschungszentrum Jülich Zentralbibliothek, Verlag Jülich; 2013. http://juser. fz-juelich.de/record/136184 webcite
- [35]Palmblad M, Ramström M, Markides KE, Håkansson P, Bergquist J. Prediction of chromatographic retention and protein identification in liquid chromatography/mass spectrometry. Anal Chem. 2002; 74(22):5826-30.
- [36]Krokhin OV. Sequence-specific retention calculator. Algorithm for peptide retention prediction in ion-pair RP-HPLC: Application to 300-and 100-angstrom pore size C18 sorbents. Anal Chem. 2006; 78(22):7785-95.
- [37]Petritis K, Kangas LJ, Yan B, Monroe ME, Strittmatter EF, Qian WJ et al.. Improved peptide elution time prediction for reversed-phase liquid chromatography-MS by incorporating peptide sequence information. Anal Chem. 2006; 78(14):5026-39.
- [38]Gibson RJ, Nepomuceno AI, Randall SM, Muthusamy N, Ghashghaei HT, Muddiman DC. Elucidation of Search Parameters for Q-Exactive to Maximize Protein Identifications at 1% False Discovery Rate Using Wild-Type and FoxJ1 Knock Out Mouse Brain Tissues. 61st ASMS Conference on Mass Spectrometry and Allied Topics; 2013 June 9-13, 2013. ASMS, Minneapolis, MN; 2013.
- [39]Wilmarth PA, Rathje WJ, David LL. An unbiased comparison of peptide identification performance between SEQUEST, Mascot and X!Tandem. 61st ASMS Conference on Mass Spectrometry and Allied Topics; 2013 June 9-13. ASMS, Minneapolis, MN; 2013.
- [40]Kim S, Slysz GW, Crowell KL, Payne SH, Anderson GA, Smith RD. IPA: an Informed Proteomics Analysis Tool for Improved Peptide Identifications. 61st ASMS Conference on Mass Spectrometry and Allied Topics; 2013 June 9-13. ASMS, Minneapolis, MN; 2013.
- [41]Michalski A, Damoc E, Hauschild JP, Lange O, Wieghaus A, Makarov A et al.. Mass spectrometry-based proteomics using Q Exactive, a high-performance benchtop quadrupole Orbitrap mass spectrometer. Mol Cell Proteomics. 2011; 10(9):M111.
- [42]Stoyanovich J, Taskar B, Davidson S. Exploring repositories of scientific workflows. Proceedings of the 1st International Workshop on Workflow Approaches to New Data-centric Science. ACM, Indianapolis, Indiana; 2010.