期刊论文详细信息
Molecules
CRFalign: A Sequence-Structure Alignment of Proteins Based on a Combination of HMM-HMM Comparison and Conditional Random Fields
Sung Jong Lee1  Keehyoung Joo2  Juyong Lee3  In-Ho Lee4  Sangjin Sim5  Jooyoung Lee6 
[1] Basic Science Institute, Changwon National University, Changwon 51140, Korea;Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea;Department of Chemistry, Kangwon National University, Chuncheon 24341, Korea;Korea Research Institute of Standards and Science (KRISS), Daejeon 34113, Korea;NAVER CLOVA, Seongnam 13561, Korea;School of Computational Sciences, Korea Institute for Advanced Study, Seoul 02455, Korea;
关键词: protein structure prediction;    sequence-structure alignment;    template-based modeling;    conditional random fields;    boosted regression trees;    CASP;   
DOI  :  10.3390/molecules27123711
来源: DOAJ
【 摘 要 】

Sequence–structure alignment for protein sequences is an important task for the template-based modeling of 3D structures of proteins. Building a reliable sequence–structure alignment is a challenging problem, especially for remote homologue target proteins. We built a method of sequence–structure alignment called CRFalign, which improves upon a base alignment model based on HMM-HMM comparison by employing pairwise conditional random fields in combination with nonlinear scoring functions of structural and sequence features. Nonlinear scoring part is implemented by a set of gradient boosted regression trees. In addition to sequence profile features, various position-dependent structural features are employed including secondary structures and solvent accessibilities. Training is performed on reference alignments at superfamily levels or twilight zone chosen from the SABmark benchmark set. We found that CRFalign method produces relative improvement in terms of average alignment accuracies for validation sets of SABmark benchmark. We also tested CRFalign on 51 sequence–structure pairs involving 15 FM target domains of CASP14, where we could see that CRFalign leads to an improvement in average modeling accuracies in these hard targets (TM-CRFalign 42.94%) compared with that of HHalign (TM-HHalign 39.05%) and also that of MRFalign (TM-MRFalign 36.93%). CRFalign was incorporated to our template search framework called CRFpred and was tested for a random target set of 300 target proteins consisting of Easy, Medium and Hard sets which showed a reasonable template search performance.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:2次