期刊论文详细信息
Symmetry
Source Code Authorship Identification Using Deep Neural Networks
Aleksandr Romanov1  Alexander Shelupanov1  Anna Kurtukova1 
[1] Faculty of Security, Tomsk State University of Control Systems and Radioelectronics, 634050 Tomsk, Russia;
关键词: source code;    authorship;    symmetry;    software engineering;    machine learning;    deanonymization;   
DOI  :  10.3390/sym12122044
来源: DOAJ
【 摘 要 】

Many open-source projects are developed by the community and have a common basis. The more source code is open, the more the project is open to contributors. The possibility of accidental or deliberate use of someone else’s source code as a closed functionality in another project (even a commercial) is not excluded. This situation could create copyright disputes. Adding a plagiarism check to the project lifecycle during software engineering solves this problem. However, not all code samples for comparing can be found in the public domain. In this case, the methods of identifying the source code author can be useful. Therefore, identifying the source code author is an important problem in software engineering, and it is also a research area in symmetry. This article discusses the problem of identifying the source code author and modern methods of solving this problem. Based on the experience of researchers in the field of natural language processing (NLP), the authors propose their technique based on a hybrid neural network and demonstrate its results both for simple cases of determining the authorship of the code and for those complicated by obfuscation and using of coding standards. The results show that the author’s technique successfully solves the essential problems of analogs and can be effective even in cases where there are no obvious signs indicating authorship. The average accuracy obtained for all programming languages was 95% in the simple case and exceeded 80% in the complicated ones.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:4次