首页> 外文会议>IEEE International Conference on Software Maintenance and Evolution >Supervised vs Unsupervised Models: A Holistic Look at Effort-Aware Just-in-Time Defect Prediction
【24h】

Supervised vs Unsupervised Models: A Holistic Look at Effort-Aware Just-in-Time Defect Prediction

机译:监督VS无监督模型:全面看看努力感知的立交缺陷预测

获取原文

摘要

Effort-aware just-in-time (JIT) defect prediction aims at finding more defective software changes with limited code inspection cost. Traditionally, supervised models have been used; however, they require sufficient labelled training data, which is difficult to obtain, especially for new projects. Recently, Yang et al. proposed an unsupervised model (LT) and applied it to projects with rich historical bug data. Interestingly, they reported that, under the same inspection cost (i.e., 20 percent of the total lines of code modified by all changes), it could find more defective changes than a state-of-the-art supervised model (i.e., EALR). This is surprising as supervised models that benefit from historical data are expected to perform better than unsupervised ones. Their finding suggests that previous studies on defect prediction had made a simple problem too complex. Considering the potential high impact of Yang et al.'s work, in this paper, we perform a replication study and present the following new findings: (1) Under the same inspection budget, LT requires developers to inspect a large number of changes necessitating many more context switches. (2) Although LT finds more defective changes, many highly ranked changes are false alarms. These initial false alarms may negatively impact practitioners' patience and confidence. (3) LT does not outperform EALR when the harmonic mean of Recall and Precision (i.e., F1-score) is considered. Aside from highlighting the above findings, we propose a simple but improved supervised model called CBS. When compared with EALR, CBS detects about 15% more defective changes and also significantly improves Precision and F1-score. When compared with LT, CBS achieves similar results in terms of Recall, but it significantly reduces context switches and false alarms before first success. Finally, we also discuss the implications of our findings for practitioners and researchers.
机译:努力感知的刚性(JIT)缺陷预测旨在找到更多有缺陷的软件更改,具有有限的代码检查成本。传统上,已经使用了监督模型;但是,它们需要足够的标记培训数据,这难以获得,特别是对于新项目。最近,杨等人。提出了无监督的模型(LT)并将其应用于具有丰富的历史错误数据的项目。有趣的是,他们报告说,根据相同的检查成本(即,通过所有变化修改的代码总数的20%),它可以找到比最先进的监督模型(即eAlr)的更有缺陷的变化。这令人惊讶的是,由于历史数据中受益的监督模型预计会比无监督更好。他们的发现表明,以前关于缺陷预测的研究已经使一个简单的问题太复杂了。考虑到杨等人的潜在高影响力。在本文中,我们进行复制研究并呈现以下新发现:(1)在同一检验预算下,LT要求开发人员检查大量变化需要许多上下文切换。 (2)虽然LT找到了更具缺陷的变化,但许多高度排名的变化是误报。这些初始错误警报可能会影响从业者的耐心和信心。 (3)当考虑召回和精度的谐波平均值(即,F1分)时,LT不胜过EALR。除了突出上述调查结果,我们提出了一种简单但改进的监督模型,称为CBS。与EALR相比,CBS检测到约15 %的缺陷变化,并且也显着提高了精度和F1分数。与LT相比,CBS在召回方面实现了类似的结果,但在首次成功之前,它显着减少了上下文切换和误报。最后,我们还讨论了我们对从业者和研究人员的调查结果的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号