期刊论文详细信息
South African Journal of Science
Automatic classification of social media reports on violent incidents in South Africa using machine learning
关键词: whatsapp;    text classification;    word2vec;    protests;    open-source intelligence;    osint;   
DOI  :  https://doi.org/10.17159/sajs.2020/6557
来源: DOAJ
【 摘 要 】

With the growing amount of data available in the digital age, it has become increasingly important to use automated methods to extract useful information from data. One such application is the extraction of events from news sources for the purpose of a quantitative analysis that does not rely on someone needing to read through thousands of news articles. Overseas, projects such as the Integrated Crisis Early Warning System (ICEWS) monitor news stories and extract events using automated coding. However, not all violent events are reported in the news, and while monitoring only news agencies is sufficient for projects such as ICEWS which have a global focus, more news sources are required when assessing a local situation. We used WhatsApp as a news source to identify the occurrence of violent incidents in South Africa. Using machine learning, we have shown how violent incidents can be coded and recorded, allowing for a local level recording of these events over time. Our experimental results show good performance on both training and testing data sets using a logistic regression classifier with unigrams and Word2vec feature models. Future work will evaluate the inclusion of pre-trained word embedding for both Afrikaans and English words to improve the performance of the machine learning classifier.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:0次