Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) | |
Analysis of Stylometric Features and Segmentation Strategies in Intrinsic Plagiarism Detection System | |
Antonius Rachmat Chrismanto1  Sylvia Putri Gunawan2  Lucia Dwi Krisnawati2  | |
[1] UKDW;Universitas Kristen Duta Wacana; | |
关键词: intrinsic plagiarism detection; stylometry features; text segmentation; outlier; | |
DOI : 10.29207/resti.v4i5.2486 | |
来源: DOAJ |
【 摘 要 】
Two different paradigms in the field of plagiarism detection resulting in External Plagiarism Detection (EPD) and Intrinsic Plagiarism Detection (IPD) systems. The most common applied system is EPD, which requires its algorithm to make a heuristic comparison between a suspicious document with documents in a corpus. In contrast, given a suspicious document only, an algorithm of IPD should be able to find the plagiarism section by looking for text segments having different writing styles. Previous researches for Indonesian texts fell only in the field of the EPD development system. Therefore, this research focuses on and contributes to experimenting and analyzing the stylometric features and segmentation strategies to build an IPD system for Indonesian texts. The experimentation results show that the paragraph segment performs better by scoring 0.92 for Macro Averaged-Accuracy and 0.54 for Macro Averaged-F1. The stylometric features achieving the highest scores of F-1 and Accuracy are the frequency of punctuation, the average paragraph length, and the type-token ratio.
【 授权许可】
Unknown