期刊论文详细信息
International Journal on Informatics Visualization: JOIV
Text Classification Using Genetic Programming with Implementation of Map Reduce and Scraping
article
Wirarama Wedashwara1  Budi Irmawati1  Heri Wijayanto1  I Wayan Agus Arimbawa2  Vandha Pradwiyasma Widartha3 
[1] University of Mataram;Seoul National University;Telkom University
关键词: Text Classification;    Genetic Programming;    Web Scraping;    Map-reduce;   
DOI  :  10.30630/joiv.7.2.1813
来源: Politeknik Negeri Padang
PDF
【 摘 要 】

Classification of text documents on online media is a big data problem and requires automation. Text classification accuracy can decrease if there are many ambiguous terms between classes. Hadoop Map Reduce is a parallel processing framework for big data that has been widely used for text processing on big data. The study presented text classification using genetic programming by pre-processing text using Hadoop map-reduce and collecting data using web scraping. Genetic programming is used to perform association rule mining (ARM) before text classification to analyze big data patterns. The data used are articles from science-direct with the three keywords. This study aims to perform text classification with ARM-based data pattern analysis and data collection system through web-scraping, pre-processing using map-reduce, and text classification using genetic programming. Through web scraping, data has been collected by reducing duplicates as much as 17718. Map-reduce has tokenized and stopped-word removal with 36639 terms with 5189 unique terms and 31450 common terms. Evaluation of ARM with different amounts of multi-tree data can produce more and longer rules and better support. The multi-tree also produces more specific rules and better ARM performance than a single tree. Text classification evaluation shows that a single tree produces better accuracy (0.7042) than a decision tree (0.6892), and the lowest is a multi-tree(0.6754). The evaluation also shows that the ARM results are not in line with the classification results, where a multi-tree shows the best result (0.3904) from the decision tree (0.3588), and the lowest is a single tree (0.356).

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO202307110004924ZK.pdf 3797KB PDF download
  文献评价指标  
  下载次数:14次 浏览次数:5次