期刊论文详细信息
Survey Research Methods
Biasand efficiency loss in regression estimates due to duplicated observations: a Monte Carlo simulation
Francesco Sarracino1  Malgorzata Mikucka2 
[1] National Institute of Statistics of Luxembourg (STATEC) and National Research University Higher School of Economics;Université Catholique de Louvain and National Research University Higher School of Economics;
关键词: duplicated observations;    estimation bias;    Monte Carlo simulation;    inference;   
DOI  :  10.18148/srm/2017.v11i1.7149
来源: DOAJ
【 摘 要 】

Recent studies documented that survey data contain duplicate records. We assess how duplicate records affect regression estimates, and we evaluate the effectiveness of solutions to deal with duplicate records. Results show thatthe chances of obtainingunbiased estimates when data contain 40 doublets (about 5% of the sample) rangebetween 3.5% and 11.5% depending on the distribution of duplicates.If 7 quintuplets are present in the data (2% of the sample), then the probability of obtaining biased estimates ranges between 11% and 20%. Weightingthe duplicate records by the inverse of their multiplicity, or dropping superfluous duplicates outperform other solutions in all considered scenarios. Our results illustrate the risk of using data in presence of duplicate records and call for further research on strategies to analyze affected data.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:1次