| The international arab journal of information technology | |
| A Genetic Algorithm based Domain Adaptation Framework for Classification of Disaster Topic | |
| article | |
| Lokabhiram Dwarakanath1  Amirrudin Kamsin1  Liyana Shuib2  | |
| [1] Department of Computer System and Technology, Universiti Malaya;Department of Information Systems, Universiti Malaya | |
| 关键词: Crisis informatics; disaster management; machine learning; domain adaptation approaches; social media; geneticalgorithm; | |
| DOI : 10.34028/iajit/20/1/7 | |
| 学科分类:计算机科学(综合) | |
| 来源: Zarqa University | |
PDF
|
|
【 摘 要 】
The ability to post short text and media messages on Social media platforms like Twitter, Facebook, etc., plays a hugerole in the exchange of information following a mass emergency event like hurricane, earthquake, tsunami etc. Disaster victims,families, and other relief operation teams utilize social media to help and support one another. Despite the benefits offered bythese communication media, the disaster topic related posts (posts that indicate conversations about the disaster event in theaftermath of the disaster) gets lost in the deluge of posts since there would be a surge in the amount of data that gets exchangedfollowing a mass emergency event. This hampers the emergency relief effort, which in turn affects the delivery of usefulinformation to the disaster victims. Research in emergency coordination via social media has received growing interest in recentyears, mainly focusing on developing machine learning-based models that can separate disaster-related topic posts from non- disaster related topic posts. Of these, supervised machine learning approaches performed well when the machine learning modeltrained using source disaster dataset and target disaster dataset are similar. However, in the real world, it may not be feasibleas different disasters have different characteristics. So, models developed using supervised machine learning approaches do notperform well in unseen disaster datasets. Therefore, domain adaptation approaches, which address the above limitation bylearning classifiers from unlabeled target data in addition to source labelled data, represent a promising direction for socialmedia crisis data classification tasks. The existing domain adaptation techniques for the classification of disaster tweets areexperimented with using single disaster event dataset pairs; then, self-training is performed on the source target dataset pairsby considering the highly confident instances in subsequent iterations of training. This could be improved with better featureengineering. Thus, this research proposes a Genetic Algorithm based Domain Adaptation Framework (GADA) for theclassification of disaster tweets. The proposed GADA combines the power of 1) Hybrid Feature Selection component using theGenetic Algorithm and Chi-Square Feature Evaluator for feature selection and 2) the Classifier component using Random Forestto classify disaster-related posts from noise on Twitter. The proposed framework addresses the challenge of the lack of labeleddata in the target disaster event by proposing a Genetic Algorithm based approach. Experimental results on Twitter datasetscorresponding to four disaster domain pair shows that the proposed framework improves the overall performance of the previoussupervised approaches and significantly reduces the training time over the previous domain adaptation techniques that do notuse the Genetic Algorithm (GA) for feature selection.
【 授权许可】
Unknown
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO202307090002570ZK.pdf | 729KB |
PDF