首页> 外文会议>IEEE Annual Computer Software and Applications Conference >Software Fault Proneness Prediction with Group Lasso Regression: On Factors that Affect Classification Performance
【24h】

Software Fault Proneness Prediction with Group Lasso Regression: On Factors that Affect Classification Performance

机译:卢赛索回归组软件故障展向预测:关于影响分类性能的因素

获取原文

摘要

Machine learning algorithms have been used extensively for software fault proneness prediction. This paper presents the first application of Group Lasso Regression (G-Lasso) for software fault proneness classification and compares its performance to six widely used machine learning algorithms. Furthermore, we explore the effects of two factors on the prediction performance: the effect of imbalance treatment using the Synthetic Minority Over-sampling Technique (SMOTE), and the effect of datasets used in building the prediction models. Our experimental results are based on 22 datasets extracted from open source projects. The main findings include: (1) G-Lasso is robust to imbalanced data and significantly outperforms the other machine learning algorithms with respect to the Recall and G-Score, i.e., the harmonic mean of Recall and (1- False Positive Rate). (2) Even though SMOTE improved the performance of all learners, it did not have statistically significant effect on G-Lasso's Recall and G-Score. Random Forest was in the top performing group of learners for all performance metrics, while Naive Bayes performed the worst of all learners. (3) When using the same change metrics as features, the choice of the dataset had no effect on the performance of most learners, including G-Lasso. Naive Bayes was the most affected, especially when balanced datasets were used.
机译:机器学习算法已广泛用于软件故障恒展预测。本文介绍了组套索回归(G-LASSO)的第一次应用,软件故障透明分类,并将其性能与六种广泛使用的机器学习算法进行比较。此外,我们探讨了两个因素对预测性能的影响:使用合成少数群体过采样技术(SMOTE)的不平衡处理的影响,以及用于构建预测模型的数据集的效果。我们的实验结果基于从开源项目中提取的22个数据集。主要发现包括:(1)G-LASSO对不平衡数据具有强大的鲁棒,并且对于召回和G评分,即召回的谐波平均值和(1 - 误率)显着优于其他机器学习算法。 (2)尽管粉刷了所有学习者的表现,但它对G-Lasso的召回和G分数没有统计上显着的影响。随机森林在所有绩效指标中表现为学习者,而天真的贝父表现了所有学习者的最糟糕。 (3)使用与功能相同的变化指标时,数据集的选择对大多数学习者的性能没有影响,包括G-LASSO。天真的贝父受到最受影响的影响,特别是在使用平衡数据集时。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号