学位论文

【摘要】

Recent advances in technology dramatically increase the volume of data that statistical agencies can gather and disseminate. The improved accessibility translates into a higher risk of identifying individuals from public microdata, and therefore increases the importance of the evaluation of disclosure risk and confidentiality control. This dissertation addresses three related but distinct research questions in statistical data confidentiality.The first study concerns the evaluation of disclosure risk for microdata when an intruder attempts to identify survey respondents by linking data records with a large external commercial data file based on a set of common variables. The dependence of disclosure risk to the commercial data coverage, the accuracy of the common identification information, and the amount of identification information to which an intruder accesses, is discussed theoretically and empirically tested using an experiment.The second study presents a practical implementation of fully-imputed synthetic data approach for a large, complex longitudinal survey as means of protecting confidentiality, following the initial proposal by Rubin (1993) and Little (1993). The imputation uses separate semiparametric algorithms for continuous, binary and categorical variables. A new combining rule of synthetic data inference is proposed to account for the uncertainty due to simultaneously imputing item-missing data and generating synthetic data. The loss of data utility is evaluated via the use of a propensity score approach in addition to three information loss metrics.The third study extends this fully-synthetic data approach to cope with situations where small area statistics are essential important. This research is the first in the statistical disclosure control literature to consider small area statistics. The goal is to create synthetic data with enough geographical details to permit small area analyses, which otherwise is impossible because such geographical identifiers are usually suppressed due to disclosure control. A Bayesian framework for appropriate small area models is proposed to generate synthetic microdata from the predictive posterior distributions. Two simulation studies and one empirical illustration are used to evaluate this approach.

【预览】

附件列表
Files	Size	Format	View
Disclosure Risk Assessments and Control.	2884KB	PDF	download


Disclosure Risk Assessments and Control.
Statistical Disclosure Control;Disclosure Risk Assessments;Fully Synthetic Data;Small Area Estimation;Multiple Imputation;Sequential Regression;Government;Politics and Law;Science;Survey Methodology
Yu, MandiGutmann, Myron P. ;
University of Michigan
关键词: Statistical Disclosure Control; Disclosure Risk Assessments; Fully Synthetic Data; Small Area Estimation; Multiple Imputation; Sequential Regression; Government; Politics and Law; Science; Survey Methodology;
Others : https://deepblue.lib.umich.edu/bitstream/handle/2027.42/61661/mandiyu_1.pdf?sequence=1&isAllowed=y
瑞士\|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF


	文献评价指标
	下载次数：17次	浏览次数：42次

【 摘 要 】

【 预 览 】

【摘要】

【预览】