1st International Scientific Practical Conference "Breakthrough Technologies and Communications in Industry" | |
Towards a graph model application for automatic text processing in data management | |
工业技术(总论);无线电电子学 | |
Grigoryeva, E.G.^1 ; Kochetova, L.A.^1 ; Pomelnikov, Y.V.^1 ; Popov, V.V.^1 ; Shtelmakh, T.V.^1 | |
Volgograd State University, 100 Universitetsky Prosp., Volgograd | |
400062, Russia^1 | |
关键词: Automatic text processing; Bag of words; Frequency Tables; Graph model; Natural language text; Natural languages; Text analysis; Word lists; | |
Others : https://iopscience.iop.org/article/10.1088/1757-899X/483/1/012077/pdf DOI : 10.1088/1757-899X/483/1/012077 |
|
学科分类:工业工程学 | |
来源: IOP | |
【 摘 要 】
Based on the two models, "bag-of-words" and graph model, the paper deals with the development of methods for automated text analysis with the purpose to classify natural language texts and randomly generated documents. Within "bag-of-words" model, the authors have found that the primary Zipf's law, which states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table, does not hold continuously true. Modifications to this law have been proposed that enable us to classify texts more efficiently. Using the graph model of the text, which takes into account the occurrence of two random words in a sentence, and the median degree of the vertices of the graph, the authors demonstrate that it can be applied to differentiate meaningless texts from meaningful ones even though the word lists of the two texts are identical.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
Towards a graph model application for automatic text processing in data management | 668KB | download |