| BMC Genomics | |
| A comprehensive genomic pan-cancer classification using The Cancer Genome Atlas gene expression data | |
| Research Article | |
| Kevin Lee1  David M. Umbach1  Kai Kang1  Nicole Croutwater1  Yuanyuan Li1  Leping Li1  Juno M. Krahn2  | |
| [1] Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, NIH, 27709, Durham, NC, USA;Genome Integrity & Structural Biology Laboratory, National Institute of Environmental Health Sciences, NIH, 27709, Durham, NC, USA; | |
| 关键词: Pan-cancer; Classification; Ga/KNN; RNA-seq; TCGA; And sex dimorphism; | |
| DOI : 10.1186/s12864-017-3906-0 | |
| received in 2017-02-10, accepted in 2017-06-27, 发布年份 2017 | |
| 来源: Springer | |
PDF
|
|
【 摘 要 】
BackgroundThe Cancer Genome Atlas (TCGA) has generated comprehensive molecular profiles. We aim to identify a set of genes whose expression patterns can distinguish diverse tumor types. Those features may serve as biomarkers for tumor diagnosis and drug development.MethodsUsing RNA-seq expression data, we undertook a pan-cancer classification of 9,096 TCGA tumor samples representing 31 tumor types. We randomly assigned 75% of samples into training and 25% into testing, proportionally allocating samples from each tumor type.ResultsWe could correctly classify more than 90% of the test set samples. Accuracies were high for all but three of the 31 tumor types, in particular, for READ (rectum adenocarcinoma) which was largely indistinguishable from COAD (colon adenocarcinoma). We also carried out pan-cancer classification, separately for males and females, on 23 sex non-specific tumor types (those unrelated to reproductive organs). Results from these gender-specific analyses largely recapitulated results when gender was ignored. Remarkably, more than 80% of the 100 most discriminative genes selected from each gender separately overlapped. Genes that were differentially expressed between genders included BNC1, FAT2, FOXA1, and HOXA11. FOXA1 has been shown to play a role for sexual dimorphism in liver cancer. The differentially discriminative genes we identified might be important for the gender differences in tumor incidence and survival.ConclusionsWe were able to identify many sets of 20 genes that could correctly classify more than 90% of the samples from 31 different tumor types using TCGA RNA-seq data. This accuracy is remarkable given the number of the tumor types and the total number of samples involved. We achieved similar results when we analyzed 23 non-sex-specific tumor types separately for males and females. We regard the frequency with which a gene appeared in those sets as measuring its importance for tumor classification. One third of the 50 most frequently appearing genes were pseudogenes; the degree of enrichment may be indicative of their importance in tumor classification. Lastly, we identified a few genes that might play a role in sexual dimorphism in certain cancers.
【 授权许可】
CC BY
© The Author(s). 2017
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO202311101662003ZK.pdf | 1025KB |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
- [36]
- [37]
- [38]
- [39]
- [40]
- [41]
- [42]
- [43]
- [44]
- [45]
- [46]
- [47]
- [48]
- [49]
PDF