Information | |
A Comparative Study of Arabic Part of Speech Taggers Using Literary Text Samples from Saudi Novels | |
Mohammad A. R. Abdeen1  Reyadh Alluhaibi2  Tareq Alfraidi3  Ahmed Yatimi4  | |
[1] Department of Computer Science, Islamic University of Madinah, Madinah 42351, Saudi Arabia;Department of Computer Science, Taibah University, Madinah 41477, Saudi Arabia;Department of Linguistics, Islamic University of Madinah, Madinah 42351, Saudi Arabia;Department of Literature, Islamic University of Madinah, Madinah 42351, Saudi Arabia; | |
关键词: Arabic tagger; Part of Speech; Saudi novel; performance evaluation; | |
DOI : 10.3390/info12120523 | |
来源: DOAJ |
【 摘 要 】
Part of Speech (POS) tagging is one of the most common techniques used in natural language processing (NLP) applications and corpus linguistics. Various POS tagging tools have been developed for Arabic. These taggers differ in several aspects, such as in their modeling techniques, tag sets and training and testing data. In this paper we conduct a comparative study of five Arabic POS taggers, namely: Stanford Arabic, CAMeL Tools, Farasa, MADAMIRA and Arabic Linguistic Pipeline (ALP) which examine their performance using text samples from Saudi novels. The testing data has been extracted from different novels that represent different types of narrations. The main result we have obtained indicates that the ALP tagger performs better than others in this particular case, and that Adjective is the most frequent mistagged POS type as compared to Noun and Verb.
【 授权许可】
Unknown