期刊论文详细信息
Electronics
In-Memory Data Anonymization Using Scalable and High Performance RDD Design
Julian Jang-Jaccard1  SibghatUllah Bazai1 
[1] Cybersecurity Lab, Computer Science/Information Technology, Massey University, Auckland 0632, New Zealand;
关键词: high performance;    data anonymization;    scalability;    spark;    big data mining;    privacy and utility;   
DOI  :  10.3390/electronics9101732
来源: DOAJ
【 摘 要 】

Recent studies in data anonymization techniques have primarily focused on MapReduce. However, these existing MapReduce based approaches often suffer from many performance overheads due to their inappropriate use of data allocation, expensive disk I/O access and network transfer, and no support for iterative tasks. We propose “SparkDA” which is a new novel anonymization technique that is designed to take the full advantage of Spark platform to generate privacy-preserving anonymized dataset in the most efficient way possible. Our proposal offers a better partition control, in-memory operation and cache management for iterative operations that are heavily utilised for data anonymization processing. Our proposal is based on Spark’s Resilient Distributed Dataset (RDD) with two critical operations of RDD, such as FlatMapRDD and ReduceByKeyRDD, respectively. The experimental results demonstrate that our proposal outperforms the existing approaches in terms of performance and scalability while maintaining high data privacy and utility levels. This illustrates that our proposal is capable to be used in a wider big data applications that demands privacy.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:0次