Baghdad Science Journal | |
Data Mining Techniques for Iraqi Biochemical Dataset Analysis | |
SarahSameer1  Suhad FaisalBehadili1  | |
[1] Computer Science Department, College of Science, University of Baghdad, Baghdad, Iraq; | |
关键词: Biomedical, Classification And Regression Tree (CART), Data mining, Hierarchical clustering, K-means.; | |
DOI : 10.21123/bsj.2022.19.2.0385 | |
来源: DOAJ |
【 摘 要 】
This research aims to analyze and simulate biochemical real test data for uncovering the relationships among the tests, and how each of them impacts others. The data were acquired from Iraqi private biochemical laboratory. However, these data have many dimensions with a high rate of null values, and big patient numbers. Then, several experiments have been applied on these data beginning with unsupervised techniques such as hierarchical clustering, and k-means, but the results were not clear. Then the preprocessing step performed, to make the dataset analyzable by supervised techniques such as Linear Discriminant Analysis (LDA), Classification And Regression Tree (CART), Logistic Regression (LR), K-Nearest Neighbor (K-NN), Naïve Bays (NB), and Support Vector Machine (SVM) techniques. CART gives clear results with high accuracy between the six supervised algorithms. It is worth noting that the preprocessing steps take remarkable efforts to handle this type of data, since its pure data set has so many null values of a ratio 94.8%, then it becomes 0% after achieving the preprocessing steps. Then, in order to apply CART algorithm, several determined tests were assumed as classes. The decision to select the tests which had been assumed as classes were depending on their acquired accuracy. Consequently, enabling the physicians to trace and connect the tests result with each other, which extends its impact on patients’ health.
【 授权许可】
Unknown