| PATTERN RECOGNITION | 卷:43 |
| Selection-fusion approach for classification of datasets with missing values | |
| Article | |
| Ghannad-Rezaie, Mostafa1,2,3  Soltanian-Zadeh, Hamid1,4  Ying, Hao2  Dong, Ming5  | |
| [1] Henry Ford Hosp, Dept Diagnost Radiol, Detroit, MI 48202 USA | |
| [2] Wayne State Univ, Dept Elect & Comp Engn, Detroit, MI 48202 USA | |
| [3] Univ Michigan, Dept Biomed Engn, Ann Arbor, MI 48105 USA | |
| [4] Univ Tehran, Dept Elect & Comp Engn, Control & Intelligent Proc Ctr Excellence, Tehran 14395515, Iran | |
| [5] Wayne State Univ, Dept Comp Sci, Detroit, MI 48202 USA | |
| 关键词: Missing value management; Subspace classifiers; Ensemble classifiers; Multiple imputations; Pruning; Support vector machine (SVM); | |
| DOI : 10.1016/j.patcog.2009.12.003 | |
| 来源: Elsevier | |
PDF
|
|
【 摘 要 】
This paper proposes a new approach based on missing value pattern discovery for classifying incomplete data. This approach is particularly designed for classification of datasets with a small number of samples and a high percentage of missing values where available missing value treatment approaches do not usually work well. Based on the pattern of the missing values, the proposed approach finds subsets of samples for which most of the features are available and trains a classifier for each subset. Then, it combines the outputs of the classifiers. Subset selection is translated into a clustering problem, allowing derivation of a mathematical framework for it. A trade off is established between the computational complexity (number of subsets) and the accuracy of the overall classifier. To deal with this trade off, a numerical criterion is proposed for the prediction of the overall performance. The proposed method is applied to seven datasets from the popular University of California, Irvine data mining archive and an epilepsy dataset from Henry Ford Hospital, Detroit, Michigan (total of eight datasets). Experimental results show that classification accuracy of the proposed method is superior to those of the widely used multiple imputations method and four other methods. They also show that the level of superiority depends on the pattern and percentage of missing values. (C) 2009 Elsevier Ltd. All rights reserved.
【 授权许可】
Free
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| 10_1016_j_patcog_2009_12_003.pdf | 495KB |
PDF