会议论文详细信息
High Performance Computing Symposium 2013
A case-comparison study of automatic document classification utilizing both serial and parallel approaches
计算机科学;物理学
Wilges, B.^1 ; Bastos, R.C.^1 ; Mateus, G.P.^2 ; Dantas, M.A.R.^2
Department of Engineering and Knowledge Management (EGC), Federal University of Santa Catarina (UFSC), Florianópolis
SC
88040-900, Brazil^1
Department of Informatics and Statistic (INE), Federal University of Santa Catarina (UFSC), Florianópolis, SC
88040-900, Brazil^2
关键词: Comparison study;    Differential information;    Distributed processing;    Document Classification;    Map-reduce programming;    Open source system;    Software environments;    Unstructured documents;   
Others  :  https://iopscience.iop.org/article/10.1088/1742-6596/540/1/012001/pdf
DOI  :  10.1088/1742-6596/540/1/012001
学科分类:计算机科学(综合)
来源: IOP
PDF
【 摘 要 】

A well-known problem faced by any organization nowadays is the high volume of data that is available and the required process to transform this volume into differential information. In this study, a case-comparison study of automatic document classification (ADC) approach is presented, utilizing both serial and parallel paradigms. The serial approach was implemented by adopting the RapidMiner software tool, which is recognized as the worldleading open-source system for data mining. On the other hand, considering the MapReduce programming model, the Hadoop software environment has been used. The main goal of this case-comparison study is to exploit differences between these two paradigms, especially when large volumes of data such as Web text documents are utilized to build a category database. In the literature, many studies point out that distributed processing in unstructured documents have been yielding efficient results in utilizing Hadoop. Results from our research indicate a threshold to such efficiency.

【 预 览 】
附件列表
Files Size Format View
A case-comparison study of automatic document classification utilizing both serial and parallel approaches 976KB PDF download
  文献评价指标  
  下载次数:11次 浏览次数:28次