2017 International Conference on Control Engineering and Artificial Intelligence | |
Proposal: A Hybrid Dictionary Modelling Approach for Malay Tweet Normalization | |
计算机科学 | |
Binti Muhamad, Nor Azlizawati^1 ; Idris, Norisma^1 ; Saloot, Mohammad Arshi^1 | |
University of Malaya, Kuala Lumpur | |
50603, Malaysia^1 | |
关键词: Dictionary modelling; Language model; Malay languages; N-grams; | |
Others : https://iopscience.iop.org/article/10.1088/1742-6596/806/1/012008/pdf DOI : 10.1088/1742-6596/806/1/012008 |
|
学科分类:计算机科学(综合) | |
来源: IOP | |
【 摘 要 】
Malay Twitter message presents a special deviation from the original language. Malay Tweet widely used currently by Twitter users, especially at Malaya archipelago. Thus, it is important to make a normalization system which can translated Malay Tweet language into the standard Malay language. Some researchers have conducted in natural language processing which mainly focuses on normalizing English Twitter messages, while few studies have been done for normalize Malay Tweets. This paper proposes an approach to normalize Malay Twitter messages based on hybrid dictionary modelling methods. This approach normalizes noisy Malay twitter messages such as colloquially language, novel words, and interjections into standard Malay language. This research will be used Language Model and N-grams model.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
Proposal: A Hybrid Dictionary Modelling Approach for Malay Tweet Normalization | 744KB | download |