Survey Research Methods | |
Biasand efficiency loss in regression estimates due to duplicated observations: a Monte Carlo simulation | |
Francesco Sarracino1  Malgorzata Mikucka2  | |
[1] National Institute of Statistics of Luxembourg (STATEC) and National Research University Higher School of Economics;Université Catholique de Louvain and National Research University Higher School of Economics; | |
关键词: duplicated observations; estimation bias; Monte Carlo simulation; inference; | |
DOI : 10.18148/srm/2017.v11i1.7149 | |
来源: DOAJ |
【 摘 要 】
Recent studies documented that survey data contain duplicate records. We assess how duplicate records affect regression estimates, and we evaluate the effectiveness of solutions to deal with duplicate records. Results show thatthe chances of obtainingunbiased estimates when data contain 40 doublets (about 5% of the sample) rangebetween 3.5% and 11.5% depending on the distribution of duplicates.If 7 quintuplets are present in the data (2% of the sample), then the probability of obtaining biased estimates ranges between 11% and 20%. Weightingthe duplicate records by the inverse of their multiplicity, or dropping superfluous duplicates outperform other solutions in all considered scenarios. Our results illustrate the risk of using data in presence of duplicate records and call for further research on strategies to analyze affected data.
【 授权许可】
Unknown