In this work, we describe a comprehensive frameworkfor knowledge discovery from medical records called SDM-Miner.The records are created before, during and after pancreaticislet cell transplantation1 on a group of diabetic patients.The knowledge discovery focuses on selecting the most relevantvariables for predicting the outcome of islet cell transplantstemporally, and supporting the medical understanding of thevariable relationships that would lead to insulin-free outcomeof a transplant with machine learning models. The challengesof knowledge discovery lie in the temporally sparse nature ofmedical records and the large number of variables which makethe traditional statistical analyses ineffective. Our approach toovercome the challenges is to combine data-driven computationallyintensive modeling with statistical modeling. The frameworkincorporates this approach during three phases of knowledgediscovery including (1) statistical data-preprocessing, (2) patternsearch based dimensionality reduction, and (3) association rulebased and conditional probability based data-driven modeling.We evaluate the framework by cross validating the models (of machine learning) using prediction errors and uncertainty of rule discovery. In order to demonstrate the novelty of the framework and theimproved performance in knowledge discovery, we report resultsusing real and synthetic datasets. Experimental results on synthetic data act as a sanity check in order to verify the effectiveness of our models in the absence of standard test results. The evaluation results show that our framework led to smaller mean error with the decreasing number of variable samples, higher robustness to Gaussian noise, and higher confidence and support of association rules than theprevious methods. Furthermore, we evaluate our proposed technique using existing machine learning algorithms using the Weka toolkit and show the improved performance of our work as compared to previous approaches.
【 预 览 】
附件列表
Files
Size
Format
View
A Framework for Knowledge Discovery from Sparse, High-Dimensional Medical Datasets