Journal of Computer Science | |
MATCHING LSI FOR SCALABLE INFORMATION RETRIEVAL | Science Publications | |
Rajagopal Palsonkennedy1  T. V. Gopal1  | |
关键词: SVD; K-Means Cluster; T-Dmatrix; Mapreduce; LSI; Information Retrieval; | |
DOI : 10.3844/jcssp.2012.2083.2097 | |
学科分类:计算机科学(综合) | |
来源: Science Publications | |
【 摘 要 】
Latent Semantic Indexing (LSI) is one of the well-liked techniques in the information retrieval fields. Different from the traditional information retrieval techniques, LSI is not based on the keyword matching simply. It uses statistics and algebraic computations. Based on Singular Value Decomposition (SVD), the higher dimensional matrix is converted to a lower dimensional approximate matrix, of which the noises could be filtered. And also the issues of synonymy and polysemy in the traditional techniques can be prevail over based on the investigations of the terms related with the documents. However, it is notable that LSI suffers a scalability issue due to the computing complexity of SVD. This study presents a distributed LSI algorithm MR-LSI which can solve the scalability issue using Hadoop framework based on the distributed computing model Map Reduce. It also solves the overhead issue caused by the involved clustering algorithm by k-means algorithm. The evaluations indicate that MR-LSI can gain noteworthy improvement compared to the other scheme on processing large scale of documents. One significant advantage of Hadoop is that it supports various computing environments so that the issue of unbalanced load among nodes is highlighted.Hence, a load balancing algorithm based on genetic algorithm for balancing load in static environment is proposed. The results show that it can advance the performance of a cluster according to different levels.
【 授权许可】
Unknown
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO201911300721625ZK.pdf | 571KB | download |