期刊论文详细信息
Sinkron
A proposed approach for plagiarism detection in Article documents
Ayoub Ali M.Saeed1  Alaa YaseenTaqa1 
[1] College of Education for Pure Sciences, University of Mosul, Mosul, Iraq ;
关键词: plagiarism , plagiarism detection, clustering , tfidf, cosine similarity;   
DOI  :  10.33395/sinkron.v7i2.11381
来源: DOAJ
【 摘 要 】

According to the scientific institutes, Plagiarism is defined as claiming someone else's ideas or efforts as one's own without citing the sources. Systems of plagiarism detection typically use a text similarity algorithm in a text document to look for common sentences between source and suspicious documents, either by directly matching the sentences or by embedding the sentences into a vector using TFIDF-like or other methods and then calculating the distance or the similarity between the source and suspect sentence vectors. The cosine similarity method is one of the methods for determining that distance. To cluster the documents and choose only related documents for detection, an unsupervised Machine learning technique such as K-means could be utilized. In this paper, a plagiarism detecting application was created and tested on many text document types, including doc, Docx, and pdf of research papers that were collected from the web to build the source corpus. To calculate the level of similarity between the suspicious article and the corpus of source articles, the TFIDF text encoding approach is used with NLP, K-means clustering, and cosine similarity algorithms. The proposed application was carried out with five different documents and resulted in different ratios of plagiarism, the first document has a 0.27 ratio, the second document has a 0.15 ratio, the third document has 0.19 ratio while document 4 has a 0.42 ratio, and finally, document 5 has 0.37 ratio of plagiarism. The generated detailed plagiarism ratio report presents the percentage of plagiarism in the suspicious article document. Depending on the threshold value, the application will decide if the suspicious document is acceptable or not.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:0次