期刊论文详细信息
BMC Bioinformatics
Fast batch searching for protein homology based on compression and clustering
Research Article
Jinghong Yu1  Hongwei Ge1  Liang Sun1 
[1] College of Computer Science and Technology, Dalian University of Technology, No.2, Linggong Road, Dalian, China;
关键词: Protein homology;    Batch searching;    Compression;    Clustering;   
DOI  :  10.1186/s12859-017-1938-8
 received in 2017-07-18, accepted in 2017-11-14,  发布年份 2017
来源: Springer
PDF
【 摘 要 】

BackgroundIn bioinformatics community, many tasks associate with matching a set of protein query sequences in large sequence datasets. To conduct multiple queries in the database, a common used method is to run BLAST on each original querey or on the concatenated queries. It is inefficient since it doesn’t exploit the common subsequences shared by queries.ResultsWe propose a compression and cluster based BLASTP (C2-BLASTP) algorithm to further exploit the joint information among the query sequences and the database. Firstly, the queries and database are compressed in turn by procedures of redundancy analysis, redundancy removal and distinction record. Secondly, the database is clustered according to Hamming distance among the subsequences. To improve the sensitivity and selectivity of sequence alignments, ten groups of reduced amino acid alphabets are used. Following this, the hits finding operator is implemented on the clustered database. Furthermore, an execution database is constructed based on the found potential hits, with the objective of mitigating the effect of increasing scale of the sequence database. Finally, the homology search is performed in the execution database. Experiments on NCBI NR database demonstrate the effectiveness of the proposed C2-BLASTP for batch searching of homology in sequence database. The results are evaluated in terms of homology accuracy, search speed and memory usage.ConclusionsIt can be seen that the C2-BLASTP achieves competitive results as compared with some state-of-the-art methods.

【 授权许可】

CC BY   
© The Author(s) 2017

【 预 览 】
附件列表
Files Size Format View
RO202311102938662ZK.pdf 1394KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  文献评价指标  
  下载次数:0次 浏览次数:0次