期刊论文详细信息
Frontiers in Genetics
OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features
Genetics
Takashi Gojobori1  Xin Gao1  Magbubah Essack1  Mahmut Uludag1  Maha A. Thafar2  Somayah Albaradei3  Mona Alshahrani4 
[1] Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia;Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia;College of Computers and Information Technology, Computer Science Department, Taif University, Taif, Saudi Arabia;Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia;Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia;National Center for Artificial Intelligence (NCAI), Saudi Data and Artificial Intelligence Authority (SDAIA), Riyadh, Saudi Arabia;
关键词: machine learning;    sequence embedding;    omics;    target identification;    lung cancer;    colon cancer;    bioinformatics;    deep neural network;   
DOI  :  10.3389/fgene.2023.1139626
 received in 2023-01-07, accepted in 2023-03-24,  发布年份 2023
来源: Frontiers
PDF
【 摘 要 】

Late-stage drug development failures are usually a consequence of ineffective targets. Thus, proper target identification is needed, which may be possible using computational approaches. The reason being, effective targets have disease-relevant biological functions, and omics data unveil the proteins involved in these functions. Also, properties that favor the existence of binding between drug and target are deducible from the protein’s amino acid sequence. In this work, we developed OncoRTT, a deep learning (DL)-based method for predicting novel therapeutic targets. OncoRTT is designed to reduce suboptimal target selection by identifying novel targets based on features of known effective targets using DL approaches. First, we created the “OncologyTT” datasets, which include genes/proteins associated with ten prevalent cancer types. Then, we generated three sets of features for all genes: omics features, the proteins’ amino-acid sequence BERT embeddings, and the integrated features to train and test the DL classifiers separately. The models achieved high prediction performances in terms of area under the curve (AUC), i.e., AUC greater than 0.88 for all cancer types, with a maximum of 0.95 for leukemia. Also, OncoRTT outperformed the state-of-the-art method using their data in five out of seven cancer types commonly assessed by both methods. Furthermore, OncoRTT predicts novel therapeutic targets using new test data related to the seven cancer types. We further corroborated these results with other validation evidence using the Open Targets Platform and a case study focused on the top-10 predicted therapeutic targets for lung cancer.

【 授权许可】

Unknown   
Copyright © 2023 Thafar, Albaradei, Uludag, Alshahrani, Gojobori, Essack and Gao.

【 预 览 】
附件列表
Files Size Format View
RO202310104150471ZK.pdf 1678KB PDF download
  文献评价指标  
  下载次数:2次 浏览次数:1次