IEEE Access | |
Improving Prediction Efficacy Through Abnormality Detection and Data Preprocessing | |
Naisyin Wang1  Chun-Chen Tu1  Pin-Yu Chen2  | |
[1] Department of Statistics, University of Michigan, Ann Arbor, MI, USA;IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA; | |
关键词: Data preprocessing; Gaussian mixture model; image reconstruction; outlier detection; principal component analysis; | |
DOI : 10.1109/ACCESS.2019.2930257 | |
来源: DOAJ |
【 摘 要 】
Abnormal testing data can severely reduce model performance if not processed properly. In this paper, we propose a preprocessing system to handle different types of commonly seen abnormal testing data. The system consists of an aberrant data detector and an aberrant data corrector. The aberrant data detector is responsible for classifying the type of incoming data. Based on the data type, the aberrant data corrector will take different actions to amend testing data. Users can then apply their preferred prediction methods on the corrected testing data. Specifically, corrupted and adversarial images are used as examples of abnormal data. We show that corrupted data can be reconstructed through a Gaussian locally linear mappings method, and the prediction performance of adversarial samples can be improved by using the nearest neighbors as a surrogate. We compare the proposed aberrant data detector and corrector with existing and well-recognized alternatives. These approaches are published individually and do not put two components together as a pre-processing system. The numerical outcomes show that our proposed components, standing alone, are competitive. The proposed system is a generic method that can be applied to different downstream predictive models. We use three existing prediction methods to illustrate the general usage of the proposed system and its capability of improving prediction efficacy.
【 授权许可】
Unknown