期刊论文详细信息
BMC Bioinformatics
PFASUM: a substitution matrix from Pfam structural alignments
Research Article
Frank Keul1  Kay Hamacher1  Michael Goesele2  Martin Hess2 
[1] Computational Biology and Simulation, Department of Biology, Technische Universität Darmstadt, Schnittspahnstraße 2, 64287, Darmstadt, Germany;Graphics, Capture and Massively Parallel Computing, Department of Computer Science, Technische Universität Darmstadt, Rundeturmstraße 12, 64283, Darmstadt, Germany;
关键词: Substitution matrix;    PFASUM;    Homologous sequence search;    Sequence alignment;   
DOI  :  10.1186/s12859-017-1703-z
 received in 2017-03-01, accepted in 2017-05-22,  发布年份 2017
来源: Springer
PDF
【 摘 要 】

BackgroundDetecting homologous protein sequences and computing multiple sequence alignments (MSA) are fundamental tasks in molecular bioinformatics. These tasks usually require a substitution matrix for modeling evolutionary substitution events derived from a set of aligned sequences. Over the last years, the known sequence space increased drastically and several publications demonstrated that this can lead to significantly better performing matrices. Interestingly, matrices based on dated sequence datasets are still the de facto standard for both tasks even though their data basis may limit their capabilities.We address these aspects by presenting a new substitution matrix series called PFASUM. These matrices are derived from Pfam seed MSAs using a novel algorithm and thus build upon expert ground truth data covering a large and diverse sequence space.ResultsWe show results for two use cases: First, we tested the homology search performance of PFASUM matrices on up-to-date ASTRAL databases with varying sequence similarity. Our study shows that the usage of PFASUM matrices can lead to significantly better homology search results when compared to conventional matrices. PFASUM matrices with comparable relative entropies to the commonly used substitution matrices BLOSUM50, BLOSUM62, PAM250, VTML160 and VTML200 outperformed their corresponding counterparts in 93% of all test cases. A general assessment also comparing matrices with different relative entropies showed that PFASUM matrices delivered the best homology search performance in the test set.Second, our results demonstrate that the usage of PFASUM matrices for MSA construction improves their quality when compared to conventional matrices. On up-to-date MSA benchmarks, at least 60% of all MSAs were reconstructed in an equal or higher quality when using MUSCLE with PFASUM31, PFASUM43 and PFASUM60 matrices instead of conventional matrices. This rate even increases to at least 76% for MSAs containing similar sequences.ConclusionsWe present the novel PFASUM substitution matrices derived from manually curated MSA ground truth data covering the currently known sequence space. Our results imply that PFASUM matrices improve homology search performance as well as MSA quality in many cases when compared to conventional substitution matrices. Hence, we encourage the usage of PFASUM matrices and especially PFASUM60 for these specific tasks.

【 授权许可】

CC BY   
© The Author(s) 2017

【 预 览 】
附件列表
Files Size Format View
RO202311096481405ZK.pdf 756KB PDF download
12864_2015_2198_Article_IEq11.gif 1KB Image download
12864_2017_4133_Article_IEq36.gif 1KB Image download
12864_2017_3655_Article_IEq6.gif 1KB Image download
12864_2015_2297_Article_IEq17.gif 1KB Image download
12864_2017_4269_Article_IEq1.gif 1KB Image download
12864_2017_4269_Article_IEq2.gif 1KB Image download
12864_2017_4269_Article_IEq4.gif 1KB Image download
12864_2016_3169_Article_IEq15.gif 1KB Image download
12864_2017_4248_Article_IEq1.gif 1KB Image download
12864_2016_3169_Article_IEq16.gif 1KB Image download
12864_2015_2252_Article_IEq2.gif 1KB Image download
12864_2017_3733_Article_IEq71.gif 1KB Image download
12864_2017_3733_Article_IEq73.gif 1KB Image download
12864_2016_3440_Article_IEq16.gif 1KB Image download
12864_2015_2304_Article_IEq17.gif 1KB Image download
12864_2015_2296_Article_IEq86.gif 1KB Image download
12864_2017_3821_Article_IEq1.gif 1KB Image download
12864_2015_2296_Article_IEq87.gif 1KB Image download
12864_2017_3821_Article_IEq3.gif 1KB Image download
12864_2016_3440_Article_IEq22.gif 1KB Image download
12864_2017_4225_Article_IEq2.gif 1KB Image download
12864_2015_2198_Article_IEq28.gif 1KB Image download
12864_2016_2821_Article_IEq6.gif 1KB Image download
12864_2015_2055_Article_IEq60.gif 1KB Image download
12864_2017_3604_Article_IEq2.gif 1KB Image download
12864_2017_4130_Article_IEq3.gif 1KB Image download
12864_2017_4309_Article_IEq13.gif 1KB Image download
【 图 表 】

12864_2017_4309_Article_IEq13.gif

12864_2017_4130_Article_IEq3.gif

12864_2017_3604_Article_IEq2.gif

12864_2015_2055_Article_IEq60.gif

12864_2016_2821_Article_IEq6.gif

12864_2015_2198_Article_IEq28.gif

12864_2017_4225_Article_IEq2.gif

12864_2016_3440_Article_IEq22.gif

12864_2017_3821_Article_IEq3.gif

12864_2015_2296_Article_IEq87.gif

12864_2017_3821_Article_IEq1.gif

12864_2015_2296_Article_IEq86.gif

12864_2015_2304_Article_IEq17.gif

12864_2016_3440_Article_IEq16.gif

12864_2017_3733_Article_IEq73.gif

12864_2017_3733_Article_IEq71.gif

12864_2015_2252_Article_IEq2.gif

12864_2016_3169_Article_IEq16.gif

12864_2017_4248_Article_IEq1.gif

12864_2016_3169_Article_IEq15.gif

12864_2017_4269_Article_IEq4.gif

12864_2017_4269_Article_IEq2.gif

12864_2017_4269_Article_IEq1.gif

12864_2015_2297_Article_IEq17.gif

12864_2017_3655_Article_IEq6.gif

12864_2017_4133_Article_IEq36.gif

12864_2015_2198_Article_IEq11.gif

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  • [44]
  文献评价指标  
  下载次数:1次 浏览次数:1次