Biotechnology & Biotechnological Equipment | |
An efficient protein homology detection approach based on seq2seq model and ranking | |
Shaowen Yao1  Song Gao2  Shui Yu3  | |
[1] Department of Cyberspace Security, National Pilot School of Software, Yunnan University;Department of Information and Electronic Science, School of Information Science and Engineering, Yunnan University;School of Computer Science, Faculty of Engineering and Information Technology, University of Technology Sydney; | |
关键词: homology detection; translation task; seq2seq model; ranking; | |
DOI : 10.1080/13102818.2021.1892522 | |
来源: DOAJ |
【 摘 要 】
Evolutionary information is essential for the protein annotation. The number of homologs of a protein retrieved is correlated with the annotations related to the protein structure or function. With the continuous increase in the number of available sequences, fast and effective homology detection methods are particularly important. To increase the efficiency of homology detection, a novel method named CONVERT is proposed in this paper. This method regards homology detection as a translation task and presents a concept of representative protein. Representative proteins are not real proteins. A representative protein corresponds to a protein family, it contains the characteristics of the family. Our method employs the seq2seq model to establish the many-to-one relationship between proteins and representative proteins. Based on the many-to-one relationship, CONVERT converts protein sequences into fixed-length numerical representations, so as to increase the efficiency of homology detection by using numerical comparison instead of sequence alignment. For alignment results, our method adopts ranking to obtain a sorted list. We evaluate the proposed method on two benchmark datasets. The experimental results show that the performances of our method are comparable with the state-of-the-art methods. Meanwhile, our method is ultra-fast and can obtain results in hundreds of milliseconds.
【 授权许可】
Unknown