期刊论文详细信息
Malaysian Journal of Computer Science
Exhaustive Affix Stripping And A Malay Word Register To Solve Stemming Errors And Ambiguity Problem In Malay Stemmers
Rukaini Abdullah1  Norisma Idris1  Salhana Amad Darwis1 
关键词: Malay language;    stemming;    Malay Language stemmers;    Malay word register;    ambiguity problem;    under stemming;    over stemming;   
DOI  :  
学科分类:社会科学、人文和艺术(综合)
来源: University of Malaya * Faculty of Computer Science and Information Technology
PDF
【 摘 要 】

Stemmers or word stemming algorithms reduce a derivative word to its root word by removing all the affixes.The complexity of Malay Language (ML) morphological rules and Malay lexicon make stemming Malay wordsdifficult. There is no fixed method to determine the affix to be removed from a derivative word to produce thecorrect root word. Furthermore, a derivative word could contain one or more valid root words. Stemmingerrors still exist in the previous Malay Language Stemmers (MLS). Regardless of the approaches used, they rely on the first affix matched or the first root word found. Hence, some words were under stemmed or overstemmed while words with many valid root words were not stemmed to reveal the correct root word. Thismultiple root words or ambiguity problem, however, has never been addressed by previous MLS. To solve the over stemming and under stemming errors, we propose an approach that exhaustively strips all matched affixes to ensure that a valid root word will be extracted. In addition, we also propose the use of a Malay Word Register to address the ambiguity problem of determining the correct root word. We tested the proposed approach with words from newspaper articles, Malay translation of the Quran, History essays and incorrectlystemmed words from the previous stemmers. The results reveal this stemmer is successful with 99.8% accuracy. There were no stemming errors. The imperfect accuracy is due to the ambiguity problem approach.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO201912010262644ZK.pdf 385KB PDF download
  文献评价指标  
  下载次数:7次 浏览次数:3次