期刊论文详细信息
The international arab journal of information technology
Multi-Lingual Language Variety Identification using Conventional Deep Learning and Transfer
article
Sameeah Noreen Hameed1  Muhammad Adnan Ashraf2  Qiao Ya-nan3 
[1] School of Software, East China Jiaotong University;Department of Computer Science, Northwestern Polytechnical University;School of Computer Science and Technology, Xi’an Jiaotong University
关键词: Language variety identification;    deep learning;    transfer learning;    binary classification;   
DOI  :  10.34028/iajit/19/5/1
学科分类:计算机科学(综合)
来源: Zarqa University
PDF
【 摘 要 】

Language variety identification tends to identify lexical and semantic variations in different varieties of a singlelanguage. Language variety identification helps build the linguistic profile of an author from written text which can be used forcyber forensics and marketing purposes. Investigating previous efforts for language variety identification, we hardly find anystudy that experiments with transfer learning approaches and/or performs a thorough comparison of different deep learningapproaches on a range of benchmark datasets. So, to bridge this gap, we propose transfer learning approaches for languagevariety identification tasks and perform an extensive comparison of them with deep learning approaches on multiple varietiesof four widely spoken languages, i.e., Arabic, English, Portuguese, and Spanish. This research has treated this task as a binaryclassification problem (Portuguese) and multi-class classification problem (Arabic, English, and Spanish). We applied twotransfer learning Bidirectional Encoder Representations from Transformers (BERT), Universal Language Model Fine-tuning(ULMFiT), three deep learning-Convolutional Neural Networks (CNN), Bidirectional Long Short Term Memory (Bi-LSTM),Gated Recurrent Units (GRU), and an ensemble approach for identifying different varieties. A thorough comparison betweenthe approaches suggests that the transfer learning based ULMFiT model outperforms all other approaches and produces thebest accuracy results for binary and multi-class language variety identification tasks.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO202307090002532ZK.pdf 517KB PDF download
  文献评价指标  
  下载次数:2次 浏览次数:0次