期刊论文详细信息
BMC Bioinformatics
A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction
Research Article
Hakime Öztürk1  Arzucan Özgür1  Elif Ozkirimli1 
[1] Department of Computer Engineering, Bogazici University, Bebek, 34342, Istanbul, Turkey;
关键词: Chemoinformatics;    SMILES;    SMILES based drug similarity;    Drug-target interaction prediction;   
DOI  :  10.1186/s12859-016-0977-x
 received in 2015-03-25, accepted in 2016-03-03,  发布年份 2016
来源: Springer
PDF
【 摘 要 】

BackgroundMolecular structures can be represented as strings of special characters using SMILES. Since each molecule is represented as a string, the similarity between compounds can be computed using SMILES-based string similarity functions. Most previous studies on drug-target interaction prediction use 2D-based compound similarity kernels such as SIMCOMP. To the best of our knowledge, using SMILES-based similarity functions, which are computationally more efficient than the 2D-based kernels, has not been investigated for this task before.ResultsIn this study, we adapt and evaluate various SMILES-based similarity methods for drug-target interaction prediction. In addition, inspired by the vector space model of Information Retrieval we propose cosine similarity based SMILES kernels that make use of the Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF-IDF) weighting approaches. We also investigate generating composite kernels by combining our best SMILES-based similarity functions with the SIMCOMP kernel. With this study, we provided a comparison of 13 different ligand similarity functions, each of which utilizes the SMILES string of molecule representation. Additionally, TF and TF-IDF based cosine similarity kernels are proposed.ConclusionThe more efficient SMILES-based similarity functions performed similarly to the more complex 2D-based SIMCOMP kernel in terms of AUC-ROC scores. The TF-IDF based cosine similarity obtained a better AUC-PR score than the SIMCOMP kernel on the GPCR benchmark data set. The composite kernel of TF-IDF based cosine similarity and SIMCOMP achieved the best AUC-PR scores for all data sets.

【 授权许可】

CC BY   
© Öztürk et al. 2016

【 预 览 】
附件列表
Files Size Format View
RO202311103893203ZK.pdf 462KB PDF download
Fig. 2 661KB Image download
Fig. 5 2831KB Image download
Fig. 4 2788KB Image download
【 图 表 】

Fig. 4

Fig. 5

Fig. 2

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  • [44]
  • [45]
  • [46]
  文献评价指标  
  下载次数:1次 浏览次数:0次