| 2018 4th International Conference on Environmental Science and Material Application | |
| The Design and Implementation of a Cleaning System Prototype | |
| 生态环境科学;材料科学 | |
| Yang, He^1 ; Liu, Weiwei^1 ; Wang, Xiaohui^1 ; Liu, He^1 ; Yu, Bin^2 ; Zhou, Hongwei^3 | |
| Artificial Intelligence on Electric Power System State Grid Corporation Joint Laboratory, Global Energy Interconnection Research Institute Co.Ltd, Beijing | |
| 102209, China^1 | |
| Beijing Key Laboratory on Integration and Analysis of Large-scale Stream Data, College of Computer Science, North China University of Technology, Beijing | |
| 100144, China^2 | |
| State Grid Jiangsu Economic Research Institute, Jiangsu | |
| 210000, China^3 | |
| 关键词: Algorithm model; Cleaning system; Data cleaning; Data platform; Data statistics; Design and implementations; Multiple heterogeneous data source; System prototype; | |
| Others : https://iopscience.iop.org/article/10.1088/1755-1315/252/3/032218/pdf DOI : 10.1088/1755-1315/252/3/032218 |
|
| 来源: IOP | |
PDF
|
|
【 摘 要 】
As we all know, data is one of the most valuable assets, however, raw data is often problematic, not conducive to the training of algorithm models. To cope with this, we can process the dirty data with cleaning systems [1] to obtain standard clean data for data statistics, data mininig and other use. Instead of manually modifying data, writing SQLs or other cumbersome methods which are popular present ways to clean data, the article proposes an approach by making use of the Hadoop big data platform to support massive data and support the cleaning of multiple heterogeneous data sources. Moreover, our system prototype supports custom rules and algorithms, can export results to a specified database, greatly simplifying the workload of data cleaning personnel. Based on the system design and theoretical verification presented in this paper, the author implemented a big data cleaning tool based on big data platform. The typical data cleaning process shows that the data cleaning can be achieved and user operations can be simplified on the basis of the theory proposed in this paper.
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| The Design and Implementation of a Cleaning System Prototype | 278KB |
PDF