期刊论文详细信息
BMC Bioinformatics
Transductive learning as an alternative to translation initiation site identification
Methodology Article
Cristiane Neri Nobre1  Luis Enrique Zárate1  Cristiano Lacerda Nunes Pinto2 
[1] Pontifical Catholic University of Minas Gerais - PUC-MG, 255, Walter Ianni Street, 31980-110, Belo Horizonte, Brazil;School of Engeneering of Minas Gerais - EMGE, 30150-250, Belo Horizonte, Brazil;
关键词: Machine learning;    Transductive learning;    SVM;    TSVM;    Translation initiation site;    mRNA;   
DOI  :  10.1186/s12859-017-1502-6
 received in 2016-06-25, accepted in 2017-01-28,  发布年份 2017
来源: Springer
PDF
【 摘 要 】

BackgroundThe correct protein coding region identification is an important and latent problem in the molecular biology field. This problem becomes a challenge due to the lack of deep knowledge about the biological systems and unfamiliarity of conservative characteristics in the messenger RNA (mRNA). Therefore, it is fundamental to research for computational methods aiming to help the patterns discovery for identification of the Translation Initiation Sites (TIS). In the field of Bioinformatics, machine learning methods have been widely applied based on the inductive inference, as Inductive Support Vector Machine (ISVM). On the other hand, not so much attention has been given to transductive inference-based machine learning methods such as Transductive Support Vector Machine (TSVM). The transductive inference performs well for problems in which the amount of unlabeled sequences is considerably greater than the labeled ones. Similarly, the problem of predicting the TIS may take advantage of transductive methods due to the fact that the amount of new sequences grows rapidly with the progress of Genome Project that allows the study of new organisms. Consequently, this work aims to investigate the transductive learning towards TIS identification and compare the results with those obtained in inductive method.ResultsThe transductive inference presents better results both in F-measure and in sensitivity in comparison with the inductive method for predicting the TIS. Additionally, it presents the least failure rate for identifying the TIS, presenting a smaller number of False Negatives (FN) than the ISVM. The ISVM and TSVM methods were validated with the molecules from the most representative organisms contained in the RefSeq database: Rattus norvegicus, Mus musculus, Homo sapiens, Drosophila melanogaster and Arabidopsis thaliana. The transductive method presented F-measure and sensitivity higher than 90% and also higher than the results obtained with ISVM. The ISVM and TSVM approaches were implemented in the TransduTIS tool, TransduTIS-I and TransduTIS-T respectively, available in a web interface. These approaches were compared with the TISHunter, TIS Miner, NetStart tools, presenting satisfactory results.ConclusionsIn relation to precision, the results are similar for the ISVM and TSVM classifiers. However, the results show that the application of TSVM approach ensured an improvement, specially for F-measure and sensitivity. Moreover, it was possible to identify a potential for the application of TSVM, which is for organisms in the initial study phase with few identified sequences in the databases.

【 授权许可】

CC BY   
© The Author(s) 2017

【 预 览 】
附件列表
Files Size Format View
RO202311104704482ZK.pdf 954KB PDF download
Fig. 5 2614KB Image download
Fig. 1 91KB Image download
Fig. 6 7306KB Image download
Fig. 2 179KB Image download
Fig. 7 466KB Image download
Fig. 1 494KB Image download
Fig. 3 526KB Image download
【 图 表 】

Fig. 3

Fig. 1

Fig. 7

Fig. 2

Fig. 6

Fig. 1

Fig. 5

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  文献评价指标  
  下载次数:8次 浏览次数:1次