Data cubes play an essential role in data analysis and decision support. In a data cube, data from a fact table is aggregated on subsets of the table's dimensions, forming a collection of smaller tables called cuboids. When the fact table includes sensitive data such as salary or diagnosis, publishing even a subset of its cuboids may compromise individuals' privacy. In this thesis, we address several problems about privacy-preserving publishing of data cubes using differential privacy or its extensions, which provide privacy guarantees for individuals by adding noise to query answers. The first problem is about how to improve the data quality in privacy-preserving data cubes. Our noise-control frameworks choose noise source in a data cube, i.e., an initial subset of cuboids to compute directly from the fact table with certain amount of noise to be injected to each of them, and then compute the remaining cuboids from them. We show that it is NP-hard to choose proper noise source for certain noise-control objetives, but provide efficient approximation algorithms. The second problem is about how to enforce consistency in the published cuboids. We proposed several approaches with provable guarantee on the noise bound and one of them can even improve the utility of differentially private cuboids (reducing error). The third problem is about how to calibrate noise in data cubes subject to certain exact background knowledge while we are trying to improve the data quality. The notation of generic differential privacy is applied, and we generalize its properties to plug it into our noise-control frameworks for handling background knowledge. Techniques proposed in this thesis provide advanced principles and major parts of a complete solution towards privacy-preserving publishing of data cubes.
【 预 览 】
附件列表
Files
Size
Format
View
Privacy-preserving data publishing and analytics using data cubes