International Conference on Computing and Applied Informatics 2016 | |
Topic Identification and Categorization of Public Information in Community-Based Social Media | |
物理学;计算机科学 | |
Kusumawardani, R.P.^1 ; Basri, M.H.^1 | |
Department of Information Systems, Institut Teknologi Sepuluh Nopember (ITS), Kampus ITS, Jl. Raya ITS, Keputih - Sukolilo Surabaya, Indonesia^1 | |
关键词: Community-based; ITS applications; k-Means algorithm; Linguistic resources; Public information; Semi-supervised method; Topic identification; Wealth of information; | |
Others : https://iopscience.iop.org/article/10.1088/1742-6596/801/1/012075/pdf DOI : 10.1088/1742-6596/801/1/012075 |
|
学科分类:计算机科学(综合) | |
来源: IOP | |
【 摘 要 】
This paper presents a work on a semi-supervised method for topic identification and classification of short texts in the social media, and its application on tweets containing dialogues in a large community of dwellers in a city, written mostly in Indonesian. These dialogues comprise a wealth of information about the city, shared in real-time. We found that despite the high irregularity of the language used, and the scarcity of suitable linguistic resources, a meaningful identification of topics could be performed by clustering the tweets using the K-Means algorithm. The resulting clusters are found to be robust enough to be the basis of a classification. On three grouping schemes derived from the clusters, we get accuracy of 95.52%, 95.51%, and 96.7 using linear SVMs, reflecting the applicability of applying this method for generating topic identification and classification on such data.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
Topic Identification and Categorization of Public Information in Community-Based Social Media | 1086KB | download |