| BMC Bioinformatics | |
| Comparison of scores for bimodality of gene expression distributions and genome-wide evaluation of the prognostic relevance of high-scoring genes | |
| Research Article | |
| Marcus Schmidt1  Jörg Rahnenführer2  Birte Hellwig3  Jan G Hengstler4  Wiebke Schormann4  Mathias C Gehrmann5  | |
| [1] Department of Obstetrics and Gynecology, Johannes Gutenberg University, Medical School, Mainz, Germany;Department of Statistics, TU Dortmund University, 44221, Dortmund, Germany;Department of Statistics, TU Dortmund University, 44221, Dortmund, Germany;Leibniz Research Centre for Working Environment and Human Factors, TU Dortmund University, Dortmund, Germany;Leibniz Research Centre for Working Environment and Human Factors, TU Dortmund University, Dortmund, Germany;Siemens Healthcare Diagnostics Products GmbH, Cologne, Germany; | |
| 关键词: Prognostic Relevance; Prognostic Gene; High Expression Group; Expression Distribution; Outlier Group; | |
| DOI : 10.1186/1471-2105-11-276 | |
| received in 2010-01-05, accepted in 2010-05-25, 发布年份 2010 | |
| 来源: Springer | |
PDF
|
|
【 摘 要 】
BackgroundA major goal of the analysis of high-dimensional RNA expression data from tumor tissue is to identify prognostic signatures for discriminating patient subgroups. For this purpose genome-wide identification of bimodally expressed genes from gene array data is relevant because distinguishability of high and low expression groups is easier compared to genes with unimodal expression distributions.Recently, several methods for the identification of genes with bimodal distributions have been introduced. A straightforward approach is to cluster the expression values and score the distance between the two distributions. Other scores directly measure properties of the distribution. The kurtosis, e.g., measures divergence from a normal distribution. An alternative is the outlier-sum statistic that identifies genes with extremely high or low expression values in a subset of the samples.ResultsWe compare and discuss scores for bimodality for expression data. For the genome-wide identification of bimodal genes we apply all scores to expression data from 194 patients with node-negative breast cancer. Further, we present the first comprehensive genome-wide evaluation of the prognostic relevance of bimodal genes. We first rank genes according to bimodality scores and define two patient subgroups based on expression values. Then we assess the prognostic significance of the top ranking bimodal genes by comparing the survival functions of the two patient subgroups. We also evaluate the global association between the bimodal shape of expression distributions and survival times with an enrichment type analysis.Various cluster-based methods lead to a significant overrepresentation of prognostic genes. A striking result is obtained with the outlier-sum statistic (p < 10-12). Many genes with heavy tails generate subgroups of patients with different prognosis.ConclusionsGenes with high bimodality scores are promising candidates for defining prognostic patient subgroups from expression data. We discuss advantages and disadvantages of the different scores for prognostic purposes. The outlier-sum statistic may be particularly valuable for the identification of genes to be included in prognostic signatures. Among the genes identified as bimodal in the breast cancer data set several have not yet previously been recognized to be prognostic and bimodally expressed in breast cancer.
【 授权许可】
CC BY
© Hellwig et al; licensee BioMed Central Ltd. 2010
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO202311103320752ZK.pdf | 1512KB |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
PDF