会议论文详细信息
1st International Scientific Practical Conference "Breakthrough Technologies and Communications in Industry"
Towards a graph model application for automatic text processing in data management
工业技术(总论);无线电电子学
Grigoryeva, E.G.^1 ; Kochetova, L.A.^1 ; Pomelnikov, Y.V.^1 ; Popov, V.V.^1 ; Shtelmakh, T.V.^1
Volgograd State University, 100 Universitetsky Prosp., Volgograd
400062, Russia^1
关键词: Automatic text processing;    Bag of words;    Frequency Tables;    Graph model;    Natural language text;    Natural languages;    Text analysis;    Word lists;   
Others  :  https://iopscience.iop.org/article/10.1088/1757-899X/483/1/012077/pdf
DOI  :  10.1088/1757-899X/483/1/012077
学科分类:工业工程学
来源: IOP
PDF
【 摘 要 】

Based on the two models, "bag-of-words" and graph model, the paper deals with the development of methods for automated text analysis with the purpose to classify natural language texts and randomly generated documents. Within "bag-of-words" model, the authors have found that the primary Zipf's law, which states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table, does not hold continuously true. Modifications to this law have been proposed that enable us to classify texts more efficiently. Using the graph model of the text, which takes into account the occurrence of two random words in a sentence, and the median degree of the vertices of the graph, the authors demonstrate that it can be applied to differentiate meaningless texts from meaningful ones even though the word lists of the two texts are identical.

【 预 览 】
附件列表
Files Size Format View
Towards a graph model application for automatic text processing in data management 668KB PDF download
  文献评价指标  
  下载次数:7次 浏览次数:15次