BMC Bioinformatics | |
PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq | |
Software | |
Xin Liu1  Shaohang Xu1  Ruo Zhou1  Bo Wen1  Xun Xu1  Siqi Liu2  Bing Zhang3  Xiaojing Wang3  | |
[1] BGI-Shenzhen, 518083, Shenzhen, China;BGI-Shenzhen, 518083, Shenzhen, China;Beijing Institute of Genomics, Chinese Academy of Sciences, 100101, Beijing, China;Department of Biomedical Informatics, Vanderbilt University School of Medicine, 37232, Nashville, TN, USA; | |
关键词: Proteomics; RNA-Seq; MS/MS; Peptide identification; Proteogenomics; | |
DOI : 10.1186/s12859-016-1133-3 | |
received in 2015-06-27, accepted in 2016-06-09, 发布年份 2016 | |
来源: Springer | |
【 摘 要 】
BackgroundPeptide identification based upon mass spectrometry (MS) is generally achieved by comparison of the experimental mass spectra with the theoretically digested peptides derived from a reference protein database. Obviously, this strategy could not identify peptide and protein sequences that are absent from a reference database. A customized protein database on the basis of RNA-Seq data is thus proposed to assist with and improve the identification of novel peptides. Correspondingly, development of a comprehensive pipeline, which provides an end-to-end solution for novel peptide detection with the customized protein database, is necessary.ResultsA pipeline with an R package, assigned as a PGA utility, was developed that enables automated treatment to the tandem mass spectrometry (MS/MS) data acquired from different MS platforms and construction of customized protein databases based on RNA-Seq data with or without a reference genome guide. Hence, PGA can identify novel peptides and generate an HTML-based report with a visualized interface. On the basis of a published dataset, PGA was employed to identify peptides, resulting in 636 novel peptides, including 510 single amino acid polymorphism (SAP) peptides, 2 INDEL peptides, 49 splice junction peptides, and 75 novel transcript-derived peptides. The software is freely available from http://bioconductor.org/packages/PGA/, and the example reports are available at http://wenbostar.github.io/PGA/.ConclusionsThe pipeline of PGA, aimed at being platform-independent and easy-to-use, was successfully developed and shown to be capable of identifying novel peptides by searching the customized protein database derived from RNA-Seq data.
【 授权许可】
CC BY
© The Author(s). 2016
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311101487519ZK.pdf | 699KB | download |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
- [36]
- [37]
- [38]
- [39]