Journal of Big Data | 卷:9 |
Traffic and road conditions monitoring system using extracted information from Twitter | |
Indra Budi1  Rahmad Mahendra1  Prabu Kresna Putra1  | |
[1] Faculty of Computer Science, Universitas Indonesia; | |
关键词: Twitter; Text mining; Information extraction; Text classification; Geocoding; Road condition; | |
DOI : 10.1186/s40537-022-00621-3 | |
来源: DOAJ |
【 摘 要 】
Abstract Congested roads and daily traffic jams cause traffic disturbances. A traffic monitoring system using closed-circuit television (CCTV) has been implemented, but the information gathered is still limited for public use. This research focuses on utilizing Twitter data to monitor traffic and road conditions. Traffic-related information is extracted from social media using text mining approach. The methods include Tweet classification for filtering relevant data, location information extraction, and geocoding in order to convert text-based location into coordinate information that can be deployed into Geographic Information System. We test several supervised classification algorithms in this study, i.e., Naïve Bayes, Random Forest, Logistic Regression, and Support Vector Machine. We experiment with Bag Of Words (BOW) and Term Frequency - Inverse Document Frequency (TF-IDF) as the feature representation. The location information is extracted using Named Entity Recognition (NER) and Part-Of-Speech (POS) Tagger. The geocoding is implemented using the ArcPy library. The best model for Tweet relevance classification is the Logistic Regression classifier with the feature combination of unigram and char n-gram, achieving an F1-score of 93%. The NER-based location extractor obtains an F1-score of 54% with a precision of 96%. The geocoding success rate for extracting the location information is 68%. In addition, a web-based visualization is also implemented in order to display traffic information using the spatial interface.
【 授权许可】
Unknown