期刊论文详细信息
Proceedings of the XXth Conference of Open Innovations Association FRUCT
Frequency Word Lists and Their Variability (the Case of Russian Fiction in 1900-1930)
Anna Guseva1  Maria Turygina1  Irina Egoshina1  Mary Gukasian1  Tatiana Sherstinova2  Tatiana Skrebtsova2  Alexander Grebennikov2 
[1] National Research University Higher School of Economics / Saint Petersburg, Russia;St. Petersburg State University, Russia;
关键词: nlp;    lexis;    frequency word lists;    lexical statistics;    pos;    lemmata;    russian;    stylometry;    textometry;   
DOI  :  10.5281/zenodo.4026508
来源: DOAJ
【 摘 要 】

Lexical system is an essential component of any natural language. Frequency word lists are a convenient representation of words functional activity in language as a whole or in some particular text. The parameters and properties of frequency word lists are in the center of attention of NLP experts, since they are used in numerous practical applications related to attribution of authorship, text automatic clustering and classification. The article explores frequency word lists of Russian fiction in the period of 1900-1930, which was marked by a series of dramatic historical events and presents unique statistical data on the most frequent words, parts of speech and keywords, and their dynamics. Special attention is paid to the issues of statistical consistency of frequency word list parameters, which becomes especially relevant when studying big text data. The study was carried out on the basis of fiction texts, which by the variety of topics, lexical and stylistic diversity reflects the variability of linguistic forms better than the other written text genres. In terms of the text corpus size and character, the research of this kind is being carried out for the first time.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:0次