期刊论文详细信息
Lithuanian Journal of Statistics
Statistical Analysis of Word Frequency Distribution in Lithuanian Texts of Different Genres
Neringa Bružaitė1  Tomas Rekašius1 
[1] Vilnius Gediminas Technical University, Lithuania;
关键词: word frequencies;    structural distribution;    Zipf’s law;    hierarchical clustering;    Jaccard distance;    Ward method;   
DOI  :  10.15388/LJS.2016.13868
来源: DOAJ
【 摘 要 】

The paper examines Lithuanian texts of different authors and genres. The main points ofinterest – the number of words, the number of different words and word frequencies. Structural type distributionand Zipf’s law are applied for describing the frequency distribution of words in the text. It is obvious that thelexical diversity of any text can be defined by different words that are used in the text, also called vocabulary.It is shown that the information contained in a reduced vocabulary is enough for dividing the texts analyzedin this article into groups by genre and author using a hierarchical clustering method. In this case, distancesbetween clusters are measured using the Jaccard distance measure, and clusters are aggregated using the Wardmethod.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:1次