The international arab journal of information technology | |
A Novel Approach of Clustering Documents: Minimizing Computational Complexities in | |
article | |
Mohammed Alghobiri1  Khalid Mohiuddin1  Mohammed Abdul Khaleel2  | |
[1] Department of Management Information Systems King Khalid University;Department of Computer Science King Khalid University | |
关键词: Clustering algorithms; document categorization; document clustering; hamiltonian graph; similarity measure; | |
DOI : 10.34028/iajit/19/4/6 | |
学科分类:计算机科学(综合) | |
来源: Zarqa University | |
【 摘 要 】
This study addresses the real-time issue of managing an academic program's documents in a universityenvironment. In practice, document classification from a corpus is challenging when the dataset size is large, and thecomplexity increases if to meet some specific document management requirements. This study presents a practical approach togrouping documents based on a content similarity measure. The approach analyzes the state-of-the-art clustering algorithmsperformance, considers Hamiltonian graph properties and a distance function. The distance function measures (1) the contentsimilarity between the documents and (2) the distances between the produced clusters. The proposed algorithm improvesclusters’ quality by applying Hamiltonian graph properties. One of the significant characteristics of the proposed function isthat it determines document types from the corpus. Hence, this does not require the initial assumption of cluster number beforethe algorithm execution. This approach omits the arbitrary primordial option of k-centroids of the k-means algorithm, reducescomputational complexities, and overcomes some limitations of commonly practicing clustering algorithms. The proposedapproach enables an effective way of document organization opportunities to the information systems developers whendesigning document management systems.
【 授权许可】
Unknown
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202307090002522ZK.pdf | 1102KB | download |