期刊论文详细信息
Information
Enhancing the Performance of Telugu Named Entity Recognition Using Gazetteer Features
Lalita Bhanu Murthy Neti1  SaiKiranmai Gorla1  Aruna Malapati1 
[1] Department of Computer Science and Information Systems, Birla Institute of Technology and Science Pilani, Hyderabad Campus, Telangana 500078, India;
关键词: information extraction;    named entity recognition;    telugu language;    gazetteer;    support vector machine;    conditional random field;    margin infused relaxed algorithm;   
DOI  :  10.3390/info11020082
来源: DOAJ
【 摘 要 】

Named entity recognition (NER) is a fundamental step for many natural language processing tasks and hence enhancing the performance of NER models is always appreciated. With limited resources being available, NER for South-East Asian languages like Telugu is quite a challenging problem. This paper attempts to improve the NER performance for Telugu using gazetteer-related features, which are automatically generated using Wikipedia pages. We make use of these gazetteer features along with other well-known features like contextual, word-level, and corpus features to build NER models. NER models are developed using three well-known classifiers—conditional random field (CRF), support vector machine (SVM), and margin infused relaxed algorithms (MIRA). The gazetteer features are shown to improve the performance, and theMIRA-based NER model fared better than its counterparts SVM and CRF.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:4次