BMC Research Notes | |
Fit-for-purpose curated database application in mass spectrometry-based targeted protein identification and validation | |
J David Knox3  Gehua Wang4  Timothy R Bowden1  Shawn Babiuk5  Stuart McCorrister4  Angela Sloan4  Keding Cheng2  | |
[1] Commonwealth Scientific and Industrial Research Organisation, Animal, Food and Health Sciences, Australian Animal Health Laboratory, Private Bag 24, Geelong, Victoria 3220, Australia;Department of Human Anatomy and Cell Sciences, Faculty of Medicine, University of Manitoba, 745 Bannatyne Avenue, Winnipeg, Manitoba R3E 0 J9, Canada;Department of Medical Microbiology, Faculty of Medicine, University of Manitoba, 745 Bannatyne Avenue, Winnipeg, Manitoba R3E 0 J9, Canada;National Microbiology Laboratory, Public Health Agency of Canada, 1015 Arlington Street, Winnipeg, Manitoba R3E 3R2, Canada;Department of Immunology, Faculty of Medicine, University of Manitoba, 471 Apotex Centre, 750 McDermot Avenue, Winnipeg, MB R3E 0 T5, Canada | |
关键词: Recombinant prion protein; Tau; Flagellar typing; Sheeppox virus; Targeted protein identification; Curated database; | |
Others : 1131969 DOI : 10.1186/1756-0500-7-444 |
|
received in 2014-01-28, accepted in 2014-07-01, 发布年份 2014 | |
【 摘 要 】
Background
Mass spectrometry (MS) is a very sensitive and specific method for protein identification, biomarker discovery, and biomarker validation. Protein identification is commonly carried out by comparing MS data with public databases. However, with the development of high throughput and accurate genomic sequencing technology, public databases are being overwhelmed with new entries from different species every day. The application of these databases can also be problematic due to factors such as size, specificity, and unharmonized annotation of the molecules of interest. Current databases representing liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based searches focus on enzyme digestion patterns and sequence information and consequently, important functional information can be missed within the search output. Protein variants displaying similar sequence homology can interfere with database identification when only certain homologues are examined. In addition, recombinant DNA technology can result in products that may not be accurately annotated in public databases. Curated databases, which focus on the molecule of interest with clearer functional annotation and sequence information, are necessary for accurate protein identification and validation. Here, four cases of curated database application have been explored and summarized.
Findings
The four presented curated databases were constructed with clear goals regarding application and have proven very useful for targeted protein identification and biomarker application in different fields. They include a sheeppox virus database created for accurate identification of proteins with strong antigenicity, a custom database containing clearly annotated protein variants such as tau transcript variant 2 for accurate biomarker identification, a sheep-hamster chimeric prion protein (PrP) database constructed for assay development of prion diseases, and a custom Escherichia coli (E. coli) flagella (H antigen) database produced for MS-H, a new H-typing technique. Clearly annotating the proteins of interest was essential for highly accurate, specific, and sensitive sequence identification, and searching against public databases resulted in inaccurate identification of the sequence of interest, while combining the curated database with a public database reduced both the confidence and sequence coverage of the protein search.
Conclusion
Curated protein sequence databases incorporating clear annotations are very useful for accurate protein identification and fit-for-purpose application through MS-based biomarker validation.
【 授权许可】
2014 Cheng et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150303134335369.pdf | 175KB | download |
【 参考文献 】
- [1]Reisinger F, Martens L: Database on Demand - an online tool for the custom generation of FASTA-formatted sequence databases. Proteomics 2009, 9(18):4421-4424.
- [2]Vizcaino JA, Reisinger F, Cote R, Martens L: PRIDE and "Database on Demand" as valuable tools for computational proteomics. Methods Mol Biol 2011, 696:93-105.
- [3]Kapp EA, Schutz F, Connolly LM, Chakel JA, Meza JE, Miller CA, Fenyo D, Eng JK, Adkins JN, Omenn GS, Simpson RJ: An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: sensitivity and specificity analysis. Proteomics 2005, 5(13):3475-3490.
- [4]Li GZ, Vissers JP, Silva JC, Golick D, Gorenstein MV, Geromanos SJ: Database searching and accounting of multiplexed precursor and product ion spectra from the data independent analysis of simple and complex peptide mixtures. Proteomics 2009, 9(6):1696-1719.
- [5]Lucker J, Laszczak M, Smith D, Lund ST: Generation of a predicted protein database from EST data and application to iTRAQ analyses in grape (Vitis vinifera cv. Cabernet Sauvignon) berries at ripening initiation. BMC Genomics 2009, 10:50. BioMed Central Full Text
- [6]Tung CW: PupDB: a database of pupylated proteins. BMC Bioinformatics 2012, 13:40. BioMed Central Full Text
- [7]Wang J, Torii M, Liu H, Hart GW, Hu ZZ: dbOGAP - an integrated bioinformatics resource for protein O-GlcNAcylation. BMC Bioinformatics 2011, 12:91. BioMed Central Full Text
- [8]Isserlin R, El-Badrawi RA, Bader GD: The Biomolecular Interaction Network Database in PSI-MI 2.5. Database (Oxford) 2011, 2011:baq037.
- [9]Babiuk S, Bowden TR, Boyle DB, Wallace DB, Kitching RP: Capripoxviruses: an emerging worldwide threat to sheep, goats and cattle. Transbound Emerg Dis 2008, 55(7):263-272.
- [10]Andreadis A, Brown WM, Kosik KS: Structure and novel exons of the human tau gene. Biochemistry 1992, 31(43):10626-10633.
- [11]Nunez J: Immature and mature variants of MAP2 and tau proteins and neuronal plasticity. Trends Neurosci 1988, 11(11):477-479.
- [12]Coulthart MB, Jansen GH, Olsen E, Godal DL, Connolly T, Choi BC, Wang Z, Cashman NR: Diagnostic accuracy of cerebrospinal fluid protein markers for sporadic Creutzfeldt-Jakob disease in Canada: a 6-year prospective study. BMC Neurol 2011, 11:133. BioMed Central Full Text
- [13]Vascellari S, Orru CD, Hughson AG, King D, Barron R, Wilham JM, Baron GS, Race B, Pani A, Caughey B: Prion seeding activities of mouse scrapie strains with divergent PrPSc protease sensitivities and amyloid plaque content using RT-QuIC and eQuIC. PLoS One 2012, 7(11):e48969.
- [14]Prager R, Strutz U, Fruth A, Tschape H: Subtyping of pathogenic Escherichia coli strains using flagellar (H)-antigens: serotyping versus fliC polymorphisms. Int J Med Microbiol 2003, 292(7–8):477-486.
- [15]Cheng K, Drebot M, McCrea J, Peterson L, Lee D, McCorrister S, Nickel R, Gerbasi A, Sloan A, Janella D, Van Domselaar G, Beniac D, Booth T, Chui L, Tabor H, Westmacott G, Gilmour M, Wang G: MS-H: a novel proteomic approach to isolate and type the E. coli H antigen using membrane filtration and liquid chromatography-tandem mass spectrometry (LC-MS/MS). PLoS One 2013, 8(2):e57339.