Jisuanji kexue yu tansuo | 卷:15 |
Differentially Private Mixed Data Release Algorithm Based on k-prototype Clustering | |
QU Jingjing, CAI Ying, FAN Yanfang, XIA Hongke1  | |
[1] College of Computer, Beijing Information Science and Technology University, Beijing 100101, China; | |
关键词: differential privacy; mixed datasets; k-prototype; clustering; data release; | |
DOI : 10.3778/j.issn.1673-9418.2003048 | |
来源: DOAJ |
【 摘 要 】
Differential privacy is a model that provides strong privacy protection. Under the non-interactive frame-work, data managers can publish data sets processed by differential privacy protection technology for researchers to conduct mining and analysis. However, a lot of noise needs to be added in the data release process, which will destroy the data availability. Therefore, a differential privacy mixed data release algorithm based on k-prototype clus-tering is proposed. First, the k-prototype clustering algorithm is improved. According to different data types, different attribute difference calculation methods are selected for numerical attributes and sub-type attributes, and the more likely related records in the mixed datasets are grouped, thereby reducing the difference privacy sensitivity; Combined with the cluster center value, the differential privacy protection technology is used to process and protect data records, the Laplace mechanism is used for numerical attributes, and the exponential mechanism is used for typed attributes. The privacy analysis of the algorithm is carried out from the concept of differential privacy and the combined nature. Experimental results show that the algorithm can effectively improve data availability.
【 授权许可】
Unknown