BMC Bioinformatics | |
String kernels for protein sequence comparisons: improved fold recognition | |
Methodology Article | |
Saghi Nojoomi1  Patrice Koehl2  | |
[1] Biotechnology program, University of California, Davis, 1, Shields Avenue, 95616, Davis, CA, USA;Department of Computer Science and Genome Center, 1, Shields Avenue, 95616, Davis, CA, USA; | |
关键词: Protein sequence; Kernel; Alignment free methods; | |
DOI : 10.1186/s12859-017-1560-9 | |
received in 2016-10-21, accepted in 2017-02-23, 发布年份 2017 | |
来源: Springer | |
【 摘 要 】
BackgroundThe amino acid sequence of a protein is the blueprint from which its structure and ultimately function can be derived. Therefore, sequence comparison methods remain essential for the determination of similarity between proteins. Traditional approaches for comparing two protein sequences begin with strings of letters (amino acids) that represent the sequences, before generating textual alignments between these strings and providing scores for each alignment. When the similitude between the two protein sequences to be compared is low however, the quality of the corresponding sequence alignment is usually poor, leading to poor performance for the recognition of similarity.ResultsIn this study, we develop an alignment free alternative to these methods that is based on the concept of string kernels. Starting from recently proposed kernels on the discrete space of protein sequences (Shen et al, Found. Comput. Math., 2013,14:951-984), we introduce our own version, SeqKernel. Its implementation depends on two parameters, a coefficient that tunes the substitution matrix and the maximum length of k-mers that it includes. We provide an exhaustive analysis of the impacts of these two parameters on the performance of SeqKernel for fold recognition. We show that with the right choice of parameters, use of the SeqKernel similarity measure improves fold recognition compared to the use of traditional alignment-based methods. We illustrate the application of SeqKernel to inferring phylogeny on RNA polymerases and show that it performs as well as methods based on multiple sequence alignments.ConclusionWe have presented and characterized a new alignment free method based on a mathematical kernel for scoring the similarity of protein sequences. We discuss possible improvements of this method, as well as an extension of its applications to other modeling methods that rely on sequence comparison.
【 授权许可】
CC BY
© The Author(s) 2017
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311097377913ZK.pdf | 976KB | download | |
12864_2017_4030_Article_IEq33.gif | 1KB | Image | download |
12864_2017_4030_Article_IEq34.gif | 1KB | Image | download |
12864_2015_2296_Article_IEq115.gif | 1KB | Image | download |
12864_2017_4030_Article_IEq35.gif | 1KB | Image | download |
12864_2016_2796_Article_IEq24.gif | 1KB | Image | download |
12885_2015_Article_1803_TeX2GIF_IEq2.gif | 1KB | Image | download |
12864_2016_2796_Article_IEq25.gif | 1KB | Image | download |
12864_2017_3487_Article_IEq31.gif | 1KB | Image | download |
12864_2017_3487_Article_IEq32.gif | 1KB | Image | download |
12864_2017_4071_Article_IEq3.gif | 1KB | Image | download |
12864_2017_3920_Article_IEq1.gif | 1KB | Image | download |
12864_2015_2304_Article_IEq10.gif | 1KB | Image | download |
12864_2015_2129_Article_IEq18.gif | 1KB | Image | download |
12864_2015_2304_Article_IEq11.gif | 1KB | Image | download |
12711_2017_331_Article_IEq22.gif | 1KB | Image | download |
12864_2017_3670_Article_IEq16.gif | 1KB | Image | download |
12864_2017_3990_Article_IEq10.gif | 1KB | Image | download |
12864_2016_2695_Article_IEq1.gif | 1KB | Image | download |
12864_2017_4132_Article_IEq31.gif | 1KB | Image | download |
12864_2017_4132_Article_IEq32.gif | 1KB | Image | download |
12864_2017_4133_Article_IEq20.gif | 1KB | Image | download |
12864_2017_3781_Article_IEq1.gif | 1KB | Image | download |
12864_2017_3781_Article_IEq5.gif | 1KB | Image | download |
12864_2017_3990_Article_IEq15.gif | 1KB | Image | download |
12888_2017_1365_Article_IEq2.gif | 1KB | Image | download |
12864_2017_3990_Article_IEq17.gif | 1KB | Image | download |
12888_2017_1284_Article_IEq2.gif | 1KB | Image | download |
12864_2017_3809_Article_IEq5.gif | 1KB | Image | download |
12864_2017_4316_Article_IEq2.gif | 1KB | Image | download |
12864_2017_3492_Article_IEq8.gif | 1KB | Image | download |
【 图 表 】
12864_2017_3492_Article_IEq8.gif
12864_2017_4316_Article_IEq2.gif
12864_2017_3809_Article_IEq5.gif
12888_2017_1284_Article_IEq2.gif
12864_2017_3990_Article_IEq17.gif
12888_2017_1365_Article_IEq2.gif
12864_2017_3990_Article_IEq15.gif
12864_2017_3781_Article_IEq5.gif
12864_2017_3781_Article_IEq1.gif
12864_2017_4133_Article_IEq20.gif
12864_2017_4132_Article_IEq32.gif
12864_2017_4132_Article_IEq31.gif
12864_2016_2695_Article_IEq1.gif
12864_2017_3990_Article_IEq10.gif
12864_2017_3670_Article_IEq16.gif
12711_2017_331_Article_IEq22.gif
12864_2015_2304_Article_IEq11.gif
12864_2015_2129_Article_IEq18.gif
12864_2015_2304_Article_IEq10.gif
12864_2017_3920_Article_IEq1.gif
12864_2017_4071_Article_IEq3.gif
12864_2017_3487_Article_IEq32.gif
12864_2017_3487_Article_IEq31.gif
12864_2016_2796_Article_IEq25.gif
12885_2015_Article_1803_TeX2GIF_IEq2.gif
12864_2016_2796_Article_IEq24.gif
12864_2017_4030_Article_IEq35.gif
12864_2015_2296_Article_IEq115.gif
12864_2017_4030_Article_IEq34.gif
12864_2017_4030_Article_IEq33.gif
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
- [36]
- [37]
- [38]
- [39]
- [40]
- [41]
- [42]
- [43]
- [44]
- [45]
- [46]
- [47]
- [48]
- [49]
- [50]
- [51]
- [52]
- [53]
- [54]
- [55]
- [56]
- [57]
- [58]
- [59]
- [60]
- [61]
- [62]
- [63]
- [64]
- [65]
- [66]