期刊论文详细信息
EAI Endorsed Transactions on Scalable Information Systems
Developing a hyperparameter optimization method for classification of code snippets and questions of stack overflow: HyperSCC
article
Muhammed Maruf Öztürk1 
[1] Department of Computer Engineering, Suleyman Demirel University, West Campus
关键词: Multi-label classification;    hyperparameter optimization;    programming language prediction;   
DOI  :  10.4108/eai.27-5-2022.174084
学科分类:社会科学、人文和艺术(综合)
来源: Bern Open Publishing
PDF
【 摘 要 】

Although there exist various machine learning and text mining techniques to identify the programming language of complete code files, multi-label code snippet prediction was not considered by the research community. This work aims at devising a tuner for multi-label programming language prediction of stack overflow posts. To that end, a Hyper Source Code Classifier (HyperSCC) is devised along with rule-based automatic labeling by considering the bottlenecks of multi-label classification. The proposed method is evaluated on seven multi-label predictors to conduct an extensive analysis. The method is further compared with the three competitive alternatives in terms of one-label programming language prediction. HyperSCC outperformed the other methods in terms of the F1 score. Preprocessing results in a high reduction (50%) of training time when ensemble multi-label predictors are employed. In one-label programming language prediction, Gradient Boosting Machine (gbm) yields the highest accuracy (0.99) in predicting R posts that have a lot of distinctive words determining labels. The findings support the hypothesis that multi-label predictors can be strengthened with sophisticated feature selection and labeling approaches.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202307110000950ZK.pdf 4523KB PDF download
  文献评价指标  
  下载次数:3次 浏览次数:0次