International Conference on Informatics, Engineering, Science and Technology | |
Classification Consumer Credit for Missing Value Dataset | |
计算机科学;工业技术 | |
Noviandi, I.^1 ; Sumitra, I.D.^1 | |
Universitas Komputer Indonesia (UNIKOM), Bandung, Indonesia^1 | |
关键词: Categorical data; Classification algorithm; Classification and regression tree; Classification rates; Consumer credits; Customer profiles; Decision-tree algorithm; Logistic regressions; | |
Others : https://iopscience.iop.org/article/10.1088/1757-899X/407/1/012173/pdf DOI : 10.1088/1757-899X/407/1/012173 |
|
来源: IOP | |
【 摘 要 】
The objective of the study is to find the best method to construct a model that could predict the future failure as a function of variables obtained from the customer profile. Decision Tree and Logistic Regression are classification algorithm. One of Decision Tree algorithm is Classification and Regression Tree (CART). It can used to analyze numeric and categorical data. Logistic Regression is more accurate than Decision Tree. In fact, there is some missing value in datasets. Amelia II is the best method to estimate missing value for numeric and categorical data. This study combines Amelia II to estimate missing value, Decision Tree to screening and re-categorization variable and Logistic Regression to classifying debtor into 'good' and 'bad' risk classes. We found that the accuracy of this combined method constant until 40% missing value. The Correct Classification Rate (CCR) value for 10% - 40% same as the CCR value for dataset without missing value. Otherwise, the accuracy decreased for missing value above 40%. This method is effective if missing value of the dataset below 40%. We recommend the bank to apply this method for classify risk of debtor if the missing value is below 40%.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
Classification Consumer Credit for Missing Value Dataset | 520KB | download |