期刊论文详细信息
BMC Medical Informatics and Decision Making
Nearest neighbor imputation algorithms: a critical evaluation
Research
Alessandro Santaniello1  Lorenzo Beretta1 
[1] Referral Center for Systemic Autoimmune Diseases, Fondazione IRCCS Ca’ Granda Ospedale Maggiore Policlinico, Milan, Italy;
关键词: Near Neighbour;    Imputation Method;    Imputation Algorithm;    Near Neighbour Algorithm;    Minkowski Norm;   
DOI  :  10.1186/s12911-016-0318-z
来源: Springer
PDF
【 摘 要 】

BackgroundNearest neighbor (NN) imputation algorithms are efficient methods to fill in missing data where each missing value on some records is replaced by a value obtained from related cases in the whole set of records. Besides the capability to substitute the missing data with plausible values that are as close as possible to the true value, imputation algorithms should preserve the original data structure and avoid to distort the distribution of the imputed variable. Despite the efficiency of NN algorithms little is known about the effect of these methods on data structure.MethodsSimulation on synthetic datasets with different patterns and degrees of missingness were conducted to evaluate the performance of NN with one single neighbor (1NN) and with k neighbors without (kNN) or with weighting (wkNN) in the context of different learning frameworks: plain set, reduced set after ReliefF filtering, bagging, random choice of attributes, bagging combined with random choice of attributes (Random-Forest-like method).ResultsWhatever the framework, kNN usually outperformed 1NN in terms of precision of imputation and reduced errors in inferential statistics, 1NN was however the only method capable of preserving the data structure and data were distorted even when small values of k neighbors were considered; distortion was more severe for resampling schemas.ConclusionsThe use of three neighbors in conjunction with ReliefF seems to provide the best trade-off between imputation error and preservation of the data structure. The very same conclusions can be drawn when imputation experiments were conducted on the single proton emission computed tomography (SPECTF) heart dataset after introduction of missing data completely at random.

【 授权许可】

CC BY   
© The Author(s). 2016

【 预 览 】
附件列表
Files Size Format View
RO202311096382711ZK.pdf 548KB PDF download
12864_2016_2791_Article_IEq1.gif 1KB Image download
12864_2017_3687_Article_IEq1.gif 1KB Image download
12888_2016_877_Article_IEq15.gif 1KB Image download
12888_2016_877_Article_IEq16.gif 1KB Image download
12864_2017_3610_Article_IEq1.gif 1KB Image download
12864_2015_2129_Article_IEq5.gif 1KB Image download
【 图 表 】

12864_2015_2129_Article_IEq5.gif

12864_2017_3610_Article_IEq1.gif

12888_2016_877_Article_IEq16.gif

12888_2016_877_Article_IEq15.gif

12864_2017_3687_Article_IEq1.gif

12864_2016_2791_Article_IEq1.gif

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  文献评价指标  
  下载次数:3次 浏览次数:5次