Frontiers in Bioscience-Landmark | |
Comprehensive evaluation of protein-coding sORFs prediction based on a random sequence strategy | |
Wenwen Jiang1  Li Guo1  Bowen Qian1  Jun Wang1  Jiafeng Yu2  Jian Liu2  Xianghua Dou2  Congmin Xu2  Chunling Wang2  | |
[1] Department of Bioinformatics, Smart Health Big Data Analysis and Location Services Engineering Lab of Jiangsu Province, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, 210023 Nanjing, Jiangsu, China;Shandong Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, 253023 Dezhou, Shandong, China; | |
关键词: small open reading frames; small protein; gene prediction; genome annotation; protein-coding gene; | |
DOI : 10.52586/4943 | |
来源: DOAJ |
【 摘 要 】
Background: Small open reading frames (sORFs) with protein-coding ability present unprecedented challenge for genome annotation because of their short sequence and low expression level. In the past decade, only several prediction methods have been proposed for discovery of protein-coding sORFs and lack of objective and uniform negative datasets has become an important obstacle to sORFs prediction. The prediction efficiency of current sORFs prediction methods needs to be further evaluated to provide better research strategies for protein-coding sORFs discovery. Methods: In this work, nine mainstream existing methods for predicting protein-coding potential of ORFs are comprehensively evaluated based on a random sequence strategy. Results: The results show that the current methods perform poorly on different sORFs datasets. For comparison, a sequence based prediction algorithm trained on prokaryotic sORFs is proposed and its better prediction performance indicates that the random sequence strategy can provide feasible ideas for protein-coding sORFs predictions. Conclusions: As a kind of important functional genomic element, discovery of protein-coding sORFs has shed light on the dark proteomes. This evaluation work indicates that there is an urgent need for developing specialized prediction tools for protein-coding sORFs in both eukaryotes and prokaryotes. It is expected that the present work may provide novel ideas for future sORFs researches.
【 授权许可】
Unknown