会议论文详细信息
International Conference on Design, Engineering and Computer Sciences 2018
Data Pattern Single Column Analysis for Data Profiling using an Open Source Platform
工业技术;计算机科学
Amethyst, S.R.^1 ; Kusumasari, T.F.^1 ; Hasibuan, M.A.^1
Information System Department, School of Industrial Engineering, Telkom University Bandung, Indonesia^1
关键词: Business Process;    Data patterns;    Data preprocess;    Data profiling;    Data quality;    Open source platforms;    Open source tools;    Quality of data;   
Others  :  https://iopscience.iop.org/article/10.1088/1757-899X/453/1/012024/pdf
DOI  :  10.1088/1757-899X/453/1/012024
来源: IOP
PDF
【 摘 要 】

The importance of data quality might have a major impact on the company's existing business processes. But there are still many companies that yet to understand the importance of data quality. Many cases that often occurs to the quality of data in many companies in Indonesia is that the inputted data are not filtered, so there are issues about not standardized data pattern. This case can be handled with data preprocess in which one of the methods are data profiling. Data profiling is a proses of collecting an information of a data. In this research the main focus of the analysis by conductin data profiling using data pattern method and algorithm that adopting from OpenRefine and then modified. The results of the profiling using open source tools Pentaho Data Integration, Google OpenRefine and Data Cleaner are really difference, while Pentaho Data Integration and Google OpenRefine found exactly 70 data patterns, Data Cleaner only find 31 data patterns.

【 预 览 】
附件列表
Files Size Format View
Data Pattern Single Column Analysis for Data Profiling using an Open Source Platform 525KB PDF download
  文献评价指标  
  下载次数:27次 浏览次数:24次