2017 1st International Conference on Engineering and Applied Technology
Named entity recognition model for Indonesian tweet using CRF classifier
Munarko, Y.^1 ; Sutrisno, M.S.^1 ; Mahardika, W.A.I.^1 ; Nuryasin, I.^1 ; Azhar, Y.^1
Teknik Informatika, Universitas Muhammadiyah Malang, Indonesia^1
Others  :  https://iopscience.iop.org/article/10.1088/1757-899X/403/1/012067/pdf
DOI  :  10.1088/1757-899X/403/1/012067
Named Entity Recognition (NER) is a part of Natural Language Processing (NLP) that acts to recognize the existing word entity in the document. By using NER, it is possible to perform activities such as information extraction and text summary. One of the data sources for the NLP process is tweets which are real time, occurred frequently, but limited by the number of words per tweet. In Indonesia, twitter is one of the most popular social media with various topics, so, it is necessary to provide models, train data, and test data for Indonesian tweet. In this study, the models were built using Conditional Random Field classification from 8,000 tweets that have been grouped to formal tweets and informal tweets. By testing the models to 2,000 training data, it provided recall and precision results of 62% and 87% respectively for formal tweets, 36% and 90% respectively for informal tweets, and 60% and 86% respectively for mixed tweets. These results indicate that the created Indonesian tweet models can be used for automatic NER.

