Journal of Big Data | |
Prediction of chemoresistance trait of cancer cell lines using machine learning algorithms and systems biology analysis | |
Javad Zahiri1  S. Shahriar Arab2  Niloufar Seyed Majidi2  Mehrdad Rostami3  Albert A. Rizvanov4  Atousa Ataei4  | |
[1] Bioinformatics and Computational Omics Lab (BioCOOL), Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University;Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University;Department of Computer Engineering, University of Kurdistan;Institute of Fundamental Medicine and Biology, Kazan Federal University; | |
关键词: Machine learning algorithms; Network topology; Cancer; Drug resistance; Classification; Feature selection; | |
DOI : 10.1186/s40537-021-00477-z | |
来源: DOAJ |
【 摘 要 】
Abstract Most of the current cancer treatment approaches are invasive along with a broad spectrum of side effects. Furthermore, cancer drug resistance known as chemoresistance is a huge obstacle during treatment. This study aims to predict the resistance of several cancer cell-lines to a drug known as Cisplatin. In this papers the NCBI GEO database was used to obtain data and then the harvested data was normalized and its batch effects were corrected by the Combat software. In order to select the appropriate features for machine learning, the feature selection/reduction was performed based on the Fisher Score method. Six different algorithms were then used as machine learning algorithms to detect Cisplatin resistant and sensitive samples in cancer cell lines. Moreover, Differentially Expressed Genes (DEGs) between all the sensitive and resistance samples were harvested. The selected genes were enriched in biological pathways by the enrichr database. Topological analysis was then performed on the constructed networks using Cytoscape software. Finally, the biological description of the output genes from the performed analyses was investigated through literature review. Among the six classifiers which were trained to distinguish between cisplatin resistance samples and the sensitive ones, the KNN and the Naïve Bayes algorithms were proposed as the most convenient machines according to some calculated measures. Furthermore, the results of the systems biology analysis determined several potential chemoresistance genes among which PTGER3, YWHAH, CTNNB1, ANKRD50, EDNRB, ACSL6, IFNG and, CTNNB1 are topologically more important than others. These predictions pave the way for further experimental researches.
【 授权许可】
Unknown