| Mathematics | |
| Improving the Representativeness of a Simple Random Sample: An Optimization Model and Its Application to the Continuous Sample of Working Lives | |
| Vicente Núñez-Antón1  Marta Regúlez-Castillo1  JuanManuelPérez-Salamero González2  Carlos Vidal-Meliá2  | |
| [1] Department of Econometrics and Statistics (A.E. III), Faculty of Economics and Business, University of the Basque Country UPV/EHU, 48015 Bilbao, Spain;Department of Financial Economics and Actuarial Science, Faculty of Economics, University of Valencia, 46022 Valencia, Spain; | |
| 关键词: chi-square test; continuous sample of working lives; optimization; p-value; subsampling; | |
| DOI : 10.3390/math8081225 | |
| 来源: DOAJ | |
【 摘 要 】
This paper proposes an optimization model for selecting a larger subsample that improves the representativeness of a simple random sample previously obtained from a population larger than the population of interest. The problem formulation involves convex mixed-integer nonlinear programming (convex MINLP) and is, therefore, NP-hard. However, the solution is found by maximizing the size of the subsample taken from a stratified random sample with proportional allocation and restricting it to a p-value large enough to achieve a good fit to the population of interest using Pearson’s chi-square goodness-of-fit test. The paper also applies the model to the Continuous Sample of Working Lives (CSWL), which is a set of anonymized microdata containing information on individuals from Spanish Social Security records and the results prove that it is possible to obtain a larger subsample from the CSWL that (far) better represents the pensioner population for each of the waves analyzed.
【 授权许可】
Unknown