期刊论文详细信息
Proceedings of the XXth Conference of Open Innovations Association FRUCT
The Influence of Different Stylometric Features on the Classification of Prose by Centuries
Ksenia Lagutina1  Nadezhda Lagutina2  Ilya Paramonov3  Elena Boychuk3 
[1] P. G. Demidov Yaroslavl State University, Russia;P.G. Demidov Yaroslavl State University, Russia;Yaroslavl Steate Pdagogical University named after K.D. Ushinsky, Russia;
关键词: stylometry;    text classification;    rhythm;    natural language processing;    classification by time periods;    prose;   
DOI  :  10.23919/FRUCT49677.2020.9211036
来源: DOAJ
【 摘 要 】

In this paper the authors compare by classification quality different types of stylometric features: low-level features that include character-based and word-based ones, and high-level rhythm features. The authors classified texts into centuries with each feature type separately and their combinations applying four classifiers: Random Forest and AdaBoost meta-algorithms, a LSTM neural network, and a GRU neural network. The experiments with three text corpora in English, Russian, and French languages showed that combining rhythm features and low-level features significantly improved quality of classification by centuries. Besides, classification results allowed to compare the styles of writing in different languages from a point of view of structure of sentences.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:0次