期刊论文详细信息
BMC Bioinformatics
A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models
Research Article
Gerson Zaverucha1  Juliana S Bernardes2  Alessandra Carbone3 
[1] COPPE, Programa de Engenharia de Sistemas e Computação, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil;COPPE, Programa de Engenharia de Sistemas e Computação, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil;Université Pierre et Marie Curie, UMR7238, Génomique Analytique, 15 rue de l'Ecole de Médecine, F-75006, Paris, France;Université Pierre et Marie Curie, UMR7238, Génomique Analytique, 15 rue de l'Ecole de Médecine, F-75006, Paris, France;CNRS, UMR7238, Laboratoire de Génomique des Microorganismes, F-75006, Paris, France;
关键词: Support Vector Machine;    Frequent Pattern;    Logical Representation;    Inductive Logic Programming;    Position Specific Score Matrix;   
DOI  :  10.1186/1471-2105-12-83
 received in 2010-09-21, accepted in 2011-03-23,  发布年份 2011
来源: Springer
PDF
【 摘 要 】

BackgroundRemote homology detection is a hard computational problem. Most approaches have trained computational models by using either full protein sequences or multiple sequence alignments (MSA), including all positions. However, when we deal with proteins in the "twilight zone" we can observe that only some segments of sequences (motifs) are conserved. We introduce a novel logical representation that allows us to represent physico-chemical properties of sequences, conserved amino acid positions and conserved physico-chemical positions in the MSA. From this, Inductive Logic Programming (ILP) finds the most frequent patterns (motifs) and uses them to train propositional models, such as decision trees and support vector machines (SVM).ResultsWe use the SCOP database to perform our experiments by evaluating protein recognition within the same superfamily. Our results show that our methodology when using SVM performs significantly better than some of the state of the art methods, and comparable to other. However, our method provides a comprehensible set of logical rules that can help to understand what determines a protein function.ConclusionsThe strategy of selecting only the most frequent patterns is effective for the remote homology detection. This is possible through a suitable first-order logical representation of homologous properties, and through a set of frequent patterns, found by an ILP system, that summarizes essential features of protein functions.

【 授权许可】

CC BY   
© Bernardes et al; licensee BioMed Central Ltd. 2011

【 预 览 】
附件列表
Files Size Format View
RO202311091642407ZK.pdf 634KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  • [44]
  • [45]
  • [46]
  • [47]
  • [48]
  • [49]
  文献评价指标  
  下载次数:3次 浏览次数:0次