期刊论文详细信息
BMC Genomics
PNImodeler: web server for inferring protein-binding nucleotides from sequence data
Proceedings
Jinyong Im1  Narankhuu Tuvshinjargal1  Byungkyu Park1  Wook Lee1  Kyungsook Han1  De-Shuang Huang2 
[1] Department of Computer Science and Engineering, Inha University, Incheon, South Korea;Machine Learning and Systems Biology Lab, College of Electronics and Information Engineering, Tongji University, 201804, Shanghai, China;
关键词: Support Vector Machine;    Positive Predictive Value;    Negative Predictive Value;    Support Vector Machine Model;    Matthews Correlation Coefficient;   
DOI  :  10.1186/1471-2164-16-S3-S6
来源: Springer
PDF
【 摘 要 】

BackgroundInteractions between DNA and proteins are essential to many biological processes such as transcriptional regulation and DNA replication. With the increased availability of structures of protein-DNA complexes, several computational studies have been conducted to predict DNA binding sites in proteins. However, little attempt has been made to predict protein binding sites in DNA.ResultsFrom an extensive analysis of protein-DNA complexes, we identified powerful features of DNA and protein sequences which can be used in predicting protein binding sites in DNA sequences. We developed two support vector machine (SVM) models that predict protein binding nucleotides from DNA and/or protein sequences. One SVM model that used DNA sequence data alone achieved a sensitivity of 73.4%, a specificity of 64.8%, an accuracy of 68.9% and a correlation coefficient of 0.382 with a test dataset that was not used in training. Another SVM model that used both DNA and protein sequences achieved a sensitivity of 67.6%, a specificity of 74.3%, an accuracy of 71.4% and a correlation coefficient of 0.418.ConclusionsPredicting binding sites in double-stranded DNAs is a more difficult task than predicting binding sites in single-stranded molecules. Our study showed that protein binding sites in double-stranded DNA molecules can be predicted with a comparable accuracy as those in single-stranded molecules. Our study also demonstrated that using both DNA and protein sequences resulted in a better prediction performance than using DNA sequence data alone. The SVM models and datasets constructed in this study are available at http://bclab.inha.ac.kr/pnimodeler.

【 授权许可】

Unknown   
© Im et al.; licensee BioMed Central Ltd. 2015. This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

【 预 览 】
附件列表
Files Size Format View
RO202311093526411ZK.pdf 1723KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  文献评价指标  
  下载次数:0次 浏览次数:0次