The international arab journal of information technology | |
Text Mining Approaches for Dependent Bug Report Assembly and Severity Prediction | |
article | |
Bancha Luaphol1  Jantima Polpinij2  Manasawee Kaenampornpan2  | |
[1] Department of Digital Technology, Kalasin University;Department of Computer Science, Mahasarakham University | |
关键词: Bug report; dependent bug report assembly; bug severity prediction; threshold-based similarity analysis; cosinesimilarity; BM25; term weighting; classification algorithm; | |
DOI : 10.34028/iajit/19/6/9 | |
学科分类:计算机科学(综合) | |
来源: Zarqa University | |
【 摘 要 】
In general, most existing bug report studies focus only on solving a single specific issue. Considering of multipleissues at one is required for a more complete and comprehensive process of bug fixing. We took up this challenge andproposed a method to analyze two issues of bug reports based on text mining techniques. Firstly, dependent bug reports areassembled into an individual cluster and then the bug reports in each cluster are analyzed for their severity. The method ofdependent bug report assembly is experimented with threshold-based similarity analysis. Cosine similarity and BM25 arecompared with term frequency (tf) weighting to obtain the most appropriate method. Meanwhile, four classification algorithmsnamely Random Forest (RF), Support Vector Machines (SVM) with the RBF kernel function, Multinomial Naïve Bayes (MNB),and k-Nearest Neighbor (k-NN) are utilized to model the bug severity predictor with four term weighting schemes, i.e., tf, termfrequency-inverse document frequency (tf-idf), term frequency-inverse class frequency (tf-icf), and term frequency-inversegravity moment (tf-igm). After the experimentation process, BM25 was found to be the most appropriate for dependent bugreport assemblage, while for severity prediction using tf-icf weighting on the RF method yielded the best performance value.
【 授权许可】
Unknown
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202307090002555ZK.pdf | 1224KB | download |