首页> 外文会议>Asia-Pacific Software Engineering Conference >Impact of the Distribution Parameter of Data Sampling Approaches on Software Defect Prediction Models
【24h】

Impact of the Distribution Parameter of Data Sampling Approaches on Software Defect Prediction Models

机译:数据采样方法的分布参数对软件缺陷预测模型的影响

获取原文

摘要

Sampling methods are known to impact defect prediction performance. These sampling methods have configurable parameters that can significantly affect the prediction performance. It is however, impractical to assess the effect of all the possible different settings in the parameter space for all the several existing sampling methods. A constant and easy to tweak parameter present in all sampling methods is the distribution of the defective and non-defective modules in the dataset known as Pfp (% of fault-prone modules). In this paper, we investigate and assess the performance of defect prediction models where the Pfp parameter of sampling methods are tweaked. An empirical experiment and assessment of seven sampling methods on five prediction models over 20 releases of 10 static metric projects indicate that (1) Area Under the Receiver Operating Characteristics Curve (AUC) performance is not improved after tweaking the Pfp parameter, (2) pf (false alarms) performance degrades as the Pfp is increased. (3) a stable predictor is difficult to achieve across different Pfp rates. Hence, we conclude that the Pfp parameter setting can have a large impact on the performance (except AUC) of defect prediction models. We thus recommend researchers experiment with the Pfp parameter of the sampling method since the distribution of training datasets vary.
机译:已知采样方法会影响缺陷预测性能。这些采样方法具有可显着影响预测性能的可配置参数。然而,对于所有几种现有的采样方法,评估参数空间中所有可能的不同设置的效果是不切实际的。所有采样方法中存在的恒定且易于调整的参数是数据集中称为Pfp的有缺陷和无缺陷模块的分布(易错模块的百分比)。在本文中,我们对调整了采样方法的Pfp参数的缺陷预测模型的性能进行了调查和评估。对10个静态度量项目的20个发行版中的五个预测模型进行的七种采样方法的经验实验和评估表明,(1)调整Pfp参数后,接收器工作特性曲线(AUC)性能下的面积未得到改善,(2)pf (错误警报)性能会随着Pfp的增加而降低。 (3)在不同的Pfp速率之间很难获得稳定的预测因子。因此,我们得出结论,Pfp参数设置可能会对缺陷预测模型的性能(AUC除外)产生很大影响。因此,我们建议研究人员使用采样方法的Pfp参数进行实验,因为训练数据集的分布会有所不同。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号