期刊论文详细信息
The Programming Historian
Cleaning Data with OpenRefine
Seth van Hooland1  Max De Wilde1  Ruben Verborgh2 
[1] Université libre de Bruxelles;
[2] Universtiteit Gent;
关键词: OpenRefine;    Data cleaning;    data manipulation;   
DOI  :  
来源: DOAJ
【 摘 要 】

Duplicate records, empty values and inconsistent formats are phenomena we should be prepared to deal with when using historical data sets. This lesson will teach you how to discover inconsistencies in data contained within a spreadsheet or a database. As we increasingly share, aggregate and reuse data on the web, historians will need to respond to data quality issues which inevitably pop up. Using a program called OpenRefine, you will be able to easily identify systematic errors such as blank cells, duplicates, spelling inconsistencies, etc. OpenRefine not only allows you to quickly diagnose the accuracy of your data, but also to act upon certain errors in an automated manner.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:4次