期刊论文详细信息
Computer Science and Information Systems
Effective methods for email classification: Is it a business or personal email?
article
Jelena Graovac1  Milena Šošić1 
[1] Faculty of Mathematics, University of Belgrade Studentski Trg 16
关键词: Email classification;    business;    personal;    deep learning;    BiLSTM;    SGD;    BERT embeddings;    Tf-Idf;    lexicons;    NLP;   
DOI  :  10.2298/CSIS220212034S
学科分类:土木及结构工程学
来源: Computer Science and Information Systems
PDF
【 摘 要 】

With the steady increase in the number of Internet users, email remains the most popular and extensively used communication means. Therefore, email management is an important and growing problem for individuals and organizations. In this paper, we deal with the classification of emails into two main categories, Business and Personal. To find the best performing solution for this problem, a comprehensive set of experiments has been conducted with the deep learning algorithms: Bidirectional Long-Short Term Memory (BiLSTM) and Attention-based BiLSTM (BiLSTM+Att), together with traditional Machine Learning (ML) algorithms: Stochastic Gradient Descent (SGD) optimization applied on Support Vector Machine (SVM) and Extremely Randomized Trees (ERT) ensemble method. The variations of individual email and conversational email thread arc representations have been explored to reach the best classification generalization on the selected task. A special contribution of this paper is the extraction of a large number of additional lexical, conversational, expressional, emotional, and moral features, which proved very useful for differentiation between personal and official written conversations. The experiments were performed on the publicly available Enron email benchmark corpora on which we obtained the State-Of-the-Art (SOA) results. As part of the submission, we have made our work publicly available to the scientific community for research purposes.

【 授权许可】

CC BY-NC-ND   

【 预 览 】
附件列表
Files Size Format View
RO202307150003289ZK.pdf 754KB PDF download
  文献评价指标  
  下载次数:2次 浏览次数:0次