Proceedings of the XXth Conference of Open Innovations Association FRUCT | |
The Influence of Different Stylometric Features on the Classification of Prose by Centuries | |
Ksenia Lagutina1  Nadezhda Lagutina2  Ilya Paramonov3  Elena Boychuk3  | |
[1] P. G. Demidov Yaroslavl State University, Russia;P.G. Demidov Yaroslavl State University, Russia;Yaroslavl Steate Pdagogical University named after K.D. Ushinsky, Russia; | |
关键词: stylometry; text classification; rhythm; natural language processing; classification by time periods; prose; | |
DOI : 10.23919/FRUCT49677.2020.9211036 | |
来源: DOAJ |
【 摘 要 】
In this paper the authors compare by classification quality different types of stylometric features: low-level features that include character-based and word-based ones, and high-level rhythm features. The authors classified texts into centuries with each feature type separately and their combinations applying four classifiers: Random Forest and AdaBoost meta-algorithms, a LSTM neural network, and a GRU neural network. The experiments with three text corpora in English, Russian, and French languages showed that combining rhythm features and low-level features significantly improved quality of classification by centuries. Besides, classification results allowed to compare the styles of writing in different languages from a point of view of structure of sentences.
【 授权许可】
Unknown