期刊论文详细信息
BMC Bioinformatics
Protein secondary structure prediction using a small training set (compact model) combined with a Complex-valued neural network approach
Methodology Article
Andrzej Kloczkowski1  Saras Saraswathi2  Andrzej Kolinski3  Suresh Sundaram4  Shamima Rashid4 
[1] Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children’s Hospital, 700 Children’s Drive, Columbus, USA;Department of Paediatrics, College of Medicine, The Ohio State University, 370 W. 9th Avenue, Columbus, USA;Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children’s Hospital, 700 Children’s Drive, Columbus, USA;Sidra Medical and Research Center, Al Dafna, Doha, Qatar;Laboratory of Theory of Biopolymers, Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093, Warsaw, Poland;School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Ave, 639798, Singapore, Singapore;
关键词: Secondary structure prediction;    Heuristics;    Complex-valued relaxation network;    Inhibitor peptides;    Efficient learning;    Protein structure;    Compact model;   
DOI  :  10.1186/s12859-016-1209-0
 received in 2015-10-07, accepted in 2016-08-25,  发布年份 2016
来源: Springer
PDF
【 摘 要 】

BackgroundProtein secondary structure prediction (SSP) has been an area of intense research interest. Despite advances in recent methods conducted on large datasets, the estimated upper limit accuracy is yet to be reached. Since the predictions of SSP methods are applied as input to higher-level structure prediction pipelines, even small errors may have large perturbations in final models. Previous works relied on cross validation as an estimate of classifier accuracy. However, training on large numbers of protein chains compromises the classifier ability to generalize to new sequences. This prompts a novel approach to training and an investigation into the possible structural factors that lead to poor predictions.Here, a small group of 55 proteins termed the compact model is selected from the CB513 dataset using a heuristics-based approach. In a prior work, all sequences were represented as probability matrices of residues adopting each of Helix, Sheet and Coil states, based on energy calculations using the C-Alpha, C-Beta, Side-chain (CABS) algorithm. The functional relationship between the conformational energies computed with CABS force-field and residue states is approximated using a classifier termed the Fully Complex-valued Relaxation Network (FCRN). The FCRN is trained with the compact model proteins.ResultsThe performance of the compact model is compared with traditional cross-validated accuracies and blind-tested on a dataset of G Switch proteins, obtaining accuracies of ∼81 %. The model demonstrates better results when compared to several techniques in the literature. A comparative case study of the worst performing chain identifies hydrogen bond contacts that lead to Coil ⇔ Sheet misclassifications. Overall, mispredicted Coil residues have a higher propensity to participate in backbone hydrogen bonding than correctly predicted Coils.ConclusionsThe implications of these findings are: (i) the choice of training proteins is important in preserving the generalization of a classifier to predict new sequences accurately and (ii) SSP techniques sensitive in distinguishing between backbone hydrogen bonding and side-chain or water-mediated hydrogen bonding might be needed in the reduction of Coil ⇔ Sheet misclassifications.

【 授权许可】

CC BY   
© The Author(s) 2016

【 预 览 】
附件列表
Files Size Format View
RO202311094282666ZK.pdf 1374KB PDF download
12864_2016_2880_Article_IEq26.gif 1KB Image download
12864_2017_4030_Article_IEq33.gif 1KB Image download
12864_2017_3487_Article_IEq25.gif 1KB Image download
12864_2015_2174_Article_IEq1.gif 1KB Image download
12864_2017_4133_Article_IEq3.gif 1KB Image download
12864_2015_2192_Article_IEq19.gif 1KB Image download
【 图 表 】

12864_2015_2192_Article_IEq19.gif

12864_2017_4133_Article_IEq3.gif

12864_2015_2174_Article_IEq1.gif

12864_2017_3487_Article_IEq25.gif

12864_2017_4030_Article_IEq33.gif

12864_2016_2880_Article_IEq26.gif

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  • [44]
  • [45]
  • [46]
  • [47]
  • [48]
  • [49]
  • [50]
  • [51]
  • [52]
  • [53]
  • [54]
  • [55]
  • [56]
  • [57]
  • [58]
  • [59]
  • [60]
  • [61]
  • [62]
  文献评价指标  
  下载次数:10次 浏览次数:7次