Journal of Applied Linguistics and Lexicography | |
Automatic recognition of messages from virtual communities of drug addicts | |
Виктория Игоревна Фирсанова1  | |
[1] Saint Petersburg State University; | |
关键词: text classification; Word Embeddings; Bag-of-Words; Convolutional Neural Networks; supervised learning; text categorisation; | |
DOI : 10.33910/2687-0215-2020-2-1-16-27 | |
来源: DOAJ |
【 摘 要 】
The paper describes building a binary classifier with Convolutional Neural Network (CNN) using two different types of word vector representations, Bag-of-Words and Word Embeddings. The purpose of the classifier is to recognise messages published in virtual communities of drug-addicted people. This system may find application in healthcare as a tool for automatic identification of addicts’ communities. It may also provide insights on the features of addicts’ online discourse. The classifier is based on the dataset from Russian-speaking online VK (VKontakte) communities. The dataset comprises texts of publications and comments posted in two types of open communities. The first type includes communities which actively discuss problems of addiction to psychotropic and psychoactive substance. The second type of communities focuses on the discussion of private issues — the users share their life stories and ask for help or advice. In the latter case publications are not related to drug addiction issues. The experiments centered around the development, evaluation and comparative analyses of two models — based on Bag-of-Words and Word Embeddings, respectively. The neural network training was implemented with the Tesla T4 graphics processing unit on the Google Colab platform. The model with the best performance showed 0.99 F1-Score and 0.95 Accuracy; however, after the programme testing, a few weaknesses were found. The programme still requires retraining on a supplemented dataset which includes publications collected from both addicts’ and non-addicts’ communities describing various mental conditions including depression, anxiety and nervous disorders. This opens up an opportunity to create software that can automatically distinguish publications made by people struggling with depression caused by the use of psychoactive substances from publications made by people suffering from depressive disorders of a different kind.
【 授权许可】
Unknown