BioData Mining | |
Mining causal relationships among clinical variables for cancer diagnosis based on Bayesian analysis | |
LiMin Wang1  | |
[1] Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, JiLin University, ChangChun 130012, P. R., China | |
关键词: Restricted Bayesian classifier; Cancer diagnosis; Causal relationship; | |
Others : 1177205 DOI : 10.1186/s13040-015-0046-4 |
|
received in 2014-09-22, accepted in 2015-03-25, 发布年份 2015 | |
![]() |
【 摘 要 】
Background
Cancer is the second leading cause of death around the world after cardiovascular diseases. Over the past decades, various data mining studies have tried to predict the outcome of cancer. However, only a few reports describe the causal relationships among clinical variables or attributes, which may provide theoretical guidance for cancer diagnosis and therapy. Different restricted Bayesian classifiers have been used to discover information from numerous domains. This research work designed a novel Bayesian learning strategy to predict cause-specific death classes and proposed a graphical structure of key attributes to clarify the implicit relationships implicated in the data set.
Results
The working mechanisms of 3 classical restricted Bayesian classifiers, namely, NB, TAN and KDB, were analysed and summarised. To retain the properties of global optimisation and high-order dependency representation, the proposed learning algorithm, i.e., flexible K-dependence Bayesian network (FKBN), applies the greedy search of conditional mutual information space to identify the globally optimal ordering of the attributes and to allow the classifiers to be constructed at arbitrary points (values of K) along the attribute dependence spectrum. This method represents the relationships between different attributes by using a directed acyclic graph (DAG) model. A total of 12 data sets were selected from the SEER database and KRBM repository by 10-fold cross-validation for evaluation purposes. The findings revealed that the FKBN model outperformed NB, TAN and KDB.
Conclusions
A Bayesian classifier can graphically describe the conditional dependency among attributes. The proposed algorithm offers a trade-off between probability estimation and network structure complexity. The direct and indirect relationships between the predictive attributes and class variable should be considered simultaneously to achieve global optimisation and high-order dependency representation. By analysing the DAG inferred from the breast cancer data set of the SEER database we divided the attributes into two subgroups, namely, key attributes that should be considered first for cancer diagnosis and those that are independent of each other but are closely related to key attributes. The statistical analysis results clarify some of the causal relationships implicated in the DAG.
【 授权许可】
2015 Wang; licensee BioMed Central.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150429010900596.pdf | 1852KB | ![]() |
|
Figure 8. | 22KB | Image | ![]() |
Figure 7. | 48KB | Image | ![]() |
Figure 6. | 23KB | Image | ![]() |
Figure 5. | 94KB | Image | ![]() |
Figure 4. | 24KB | Image | ![]() |
Figure 3. | 13KB | Image | ![]() |
Figure 2. | 25KB | Image | ![]() |
Figure 1. | 21KB | Image | ![]() |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
Figure 7.
Figure 8.
【 参考文献 】
- [1]Pena-Reyes C, Sipper M: A fuzzy approach to breast cancer diagnosis. Artif Intell Med 1999, 17:131-5.
- [2]Rafe V, Farhoud SH, Rasoolzadeh S: Breast cancer prediction by using C5.0 Algorithm and BOOSTING Method. J Med Imaging Health Inf 2014, 4(4):600-4.
- [3]Khan, U, H Shin, JP Choi, and M Kim. wFDT-Weighted Fuzzy Decision Trees for Prognosis of Breast Cancer Survivability. In AusDM 2008. Adelaide, SA, Australia: Australian Computer Society; 2008.p.141–52.
- [4]Agrawal A, Misra S, Narayanan R, Polepeddi L: Lung cancer survival prediction using ensemble data mining on SEER data. Sci Prog 2014, 20(1):29-42.
- [5]Motalleb G: Artificial neural network analysis in preclinical breast cancer. Cell J 2014, 15(4):324-31.
- [6]MM and Prelevic RI: Vukicevic AMJovicic GRStojadinovic: Evolutionary assembled neural networks for making medical decisions with minimal regret: Application for predicting advanced bladder cancer outcome. Expert Syst Appl 2014, 41(18):8092-100.
- [7]Tsao, CW, and Cha Liu CY and TL. 2014. Artificial neural network for predicting pathological stage of clinically localized prostate cancer in a Taiwanese population. J Chin Med Assoc77(10): 513–8.
- [8]Alexander S, Anton B, Smith D: Clinical decision support and individualized prediction of survival in colon cancer: Bayesian belief network model. Ann Surg Oncol 2013, 20(1):161-74.
- [9]Khan HMR, Saxena A, Rana S: Bayesian Method for Modeling Male Breast Cancer Survival Data. Asian Pac J Cancer Prev. 2014, 15(2):663-9.
- [10]Jong PC, Tae HH, Rae WP: Hybrid Bayesian network model for predicting breast cancer prognosis. Healthcare Inf Res 2009, 15(1):49-57.
- [11]Molina JFG, Zheng L, Sertdemir M: Incremental learning with SVM for Multimodal classification of prostatic adenocarcinoma. PLoS One 2014, 9(4):3-12.
- [12]Mahmoodian H, Marhaban MH, Abdulrahim R: Using fuzzy association rule mining in cancer classification. Australas Phys Eng Sci Med 2011, 34(1):41-54.
- [13]Wu J, Cai Z: A naive Bayes probability estimation model based on self-adaptive differential evolution. J Intel Inf Syst 2014, 42:671-94.
- [14]Zheng F, Webb GI: Subsumption resolution: an efficient and effective technique for semi-naive Bayesian learning. Mach Learn 2012, 87:1947-88.
- [15]Cai ZH, Wang DH: Jiang LX: Improving tree augmented naive bayes for class probability estimation. Knowledge-Based Syst 2012, 26:239-45.
- [16]Francisco L, Anderson A: Bagging k-dependence probabilistic networksAn alternative powerful fraud detection tool. Expert Syst Appl 2012, 39:11583-92.
- [17]Dor O, Zhou YQ: Achieving 80% ten-fold cross-validated accuracy for secondary structure prediction by large-scale training. Proteins-Struct Function Bioinformatics 2007, 66:838-45.
- [18]SEER database. Available from:. http://seer.cancer.gov/data/.
- [19]Kent Ridge, Bio-Medicalrepository. Available from:. http://datam.i2r.a-star.edu.sg/datasets/krbd/.
- [20]Josep RA: Incremental Learning of Tree Augmented Naive Bayes Classifiers. In AAAI-02. Edited by Edmonton Alberta. AAAI Press, Canada; 2002:12-5.
- [21]Fayyad, UM, and KB Irani. 1993. Multi-interval discretization of continuous-valued attributes for classification learning. In IJCAI’93, ed. Chambéry France and Morgan Kaufmann1022–7.
- [22]Kohavi, R, and D Wolpert. 1996. Bias plus variance decomposition for zero-one loss functions. In ICML’96, ed. Morgan Kaufmann275–83. Bari, Italy.
- [23]Friedman M: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 1937, 32(7):675-01.
- [24]Peng G, Xin Z, Zhen-ning W: Which Is a more accurate predictor in colorectal survival analysis? Nine data mining algorithms vs. the TNM staging system. PLoS One 2012, 7(7):5-21.