| Frontiers in Cell and Developmental Biology | |
| Identification of Pan-Cancer Biomarkers Based on the Gene Expression Profiles of Cancer Cell Lines | |
| Tao Huang2  XianChao Zhou3  Yu-Hang Zhang4  Hao Li5  ZhanDong Li5  Lei Chen6  KaiYan Feng7  Yu-Dong Cai8  ShiJian Ding8  | |
| [1] CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China;CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China;Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China;Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States;College of Food Engineering, Jilin Engineering Normal University, Changchun, China;College of Information Engineering, Shanghai Maritime University, Shanghai, China;Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, China;School of Life Sciences, Shanghai University, Shanghai, China; | |
| 关键词: pan-cancer study; feature selection; classification algorithm; decision rule; biomarker; | |
| DOI : 10.3389/fcell.2021.781285 | |
| 来源: DOAJ | |
【 摘 要 】
There are many types of cancers. Although they share some hallmarks, such as proliferation and metastasis, they are still very different from many perspectives. They grow on different organ or tissues. Does each cancer have a unique gene expression pattern that makes it different from other cancer types? After the Cancer Genome Atlas (TCGA) project, there are more and more pan-cancer studies. Researchers want to get robust gene expression signature from pan-cancer patients. But there is large variance in cancer patients due to heterogeneity. To get robust results, the sample size will be too large to recruit. In this study, we tried another approach to get robust pan-cancer biomarkers by using the cell line data to reduce the variance. We applied several advanced computational methods to analyze the Cancer Cell Line Encyclopedia (CCLE) gene expression profiles which included 988 cell lines from 20 cancer types. Two feature selection methods, including Boruta, and max-relevance and min-redundancy methods, were applied to the cell line gene expression data one by one, generating a feature list. Such list was fed into incremental feature selection method, incorporating one classification algorithm, to extract biomarkers, construct optimal classifiers and decision rules. The optimal classifiers provided good performance, which can be useful tools to identify cell lines from different cancer types, whereas the biomarkers (e.g. NCKAP1, TNFRSF12A, LAMB2, FKBP9, PFN2, TOM1L1) and rules identified in this work may provide a meaningful and precise reference for differentiating multiple types of cancer and contribute to the personalized treatment of tumors.
【 授权许可】
Unknown