会议论文详细信息
International Conference on Computing and Applied Informatics 2016
Topic Identification and Categorization of Public Information in Community-Based Social Media
物理学;计算机科学
Kusumawardani, R.P.^1 ; Basri, M.H.^1
Department of Information Systems, Institut Teknologi Sepuluh Nopember (ITS), Kampus ITS, Jl. Raya ITS, Keputih - Sukolilo Surabaya, Indonesia^1
关键词: Community-based;    ITS applications;    k-Means algorithm;    Linguistic resources;    Public information;    Semi-supervised method;    Topic identification;    Wealth of information;   
Others  :  https://iopscience.iop.org/article/10.1088/1742-6596/801/1/012075/pdf
DOI  :  10.1088/1742-6596/801/1/012075
学科分类:计算机科学(综合)
来源: IOP
PDF
【 摘 要 】

This paper presents a work on a semi-supervised method for topic identification and classification of short texts in the social media, and its application on tweets containing dialogues in a large community of dwellers in a city, written mostly in Indonesian. These dialogues comprise a wealth of information about the city, shared in real-time. We found that despite the high irregularity of the language used, and the scarcity of suitable linguistic resources, a meaningful identification of topics could be performed by clustering the tweets using the K-Means algorithm. The resulting clusters are found to be robust enough to be the basis of a classification. On three grouping schemes derived from the clusters, we get accuracy of 95.52%, 95.51%, and 96.7 using linear SVMs, reflecting the applicability of applying this method for generating topic identification and classification on such data.

【 预 览 】
附件列表
Files Size Format View
Topic Identification and Categorization of Public Information in Community-Based Social Media 1086KB PDF download
  文献评价指标  
  下载次数:17次 浏览次数:12次