Сучасні інформаційні системи | |
METHOD FOR DETERMINING THE SEMANTIC SIMILARITY OF ARBITRARY LENGTH TEXTS USING THE TRANSFORMERS MODELS | |
В’ячеслав Радченко1  Сергій Олізаренко2  | |
[1] Kharkiv National University of Radio Electronics, Kharkiv;Kharkіv National University of Radio Electronics University, Kharkiv; | |
关键词: text; arbitrary length; semantic similarity; vector representation; Transformer model; fine-tuning; | |
DOI : 10.20998/2522-9052.2021.2.18 | |
来源: DOAJ |
【 摘 要 】
The paper considers the results of a method development for determining the semantic similarity of arbitrary length texts based on their vector representations. These vector representations are obtained via multilingual Transformers model usage, and direct problem of determining semantic similarity of arbitrary length texts is considered as the text sequence pairs classification problem using Transformers model. Comparative analysis of the most optimal Transformers model for solving such class of problems was performed. Considered in this case main stages of the method are: Transformers model fine-tuning stage in the framework of pretrained model second problem (sentence prediction), also selection and implementation stage of the summarizing method for text sequence more than 512 (1024) tokens long to solve the problem of determining the semantic similarity for arbitrary length texts.
【 授权许可】
Unknown