An Empirical Study for Software Fault-Proneness Prediction with Ensemble Learning Models on Imbalanced Data Sets

Renqing Li; Shihai Wang

首页> 外文期刊>Journal of Computers >An Empirical Study for Software Fault-Proneness Prediction with Ensemble Learning Models on Imbalanced Data Sets

【24h】

An Empirical Study for Software Fault-Proneness Prediction with Ensemble Learning Models on Imbalanced Data Sets

机译：基于不平衡数据集的集成学习模型的软件故障率预测的实证研究

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Software faults could cause serious system errors and failures, leading to huge economic losses. But currently none of inspection and verification technique is able to find and eliminate all software faults. Software testing is an important way to inspect these faults and raise software reliability, but obviously it is a really expensive job. The estimation of a module’s fault-proneness is important to minimize the software testing resources required by guiding the resource allocation on the high-risk modules. Consequently the efficiency of software testing and the reliability of the software are improved. The software faults data sets, however, originally have the imbalanced distribution. A small amount of software modules holds most faults, while the most of modules are fault-free. Such imbalanced data distribution is really a challenge for the researchers in the field of prediction for software faultproneness. In this paper, we make an investigation on software fault-prone prediction models by employing C4.5, SVM, KNN, Logistic, NaiveBayes, AdaBoost and SMOTEBoost based on software metrics. We perform an empirical study on the effectiveness of these models on imbalanced software fault data sets obtained from NASA’s MDP. After a comprehensive comparison based on the experiment results, the SMOTEBoost reveals the outstanding performances than the other models on predicting the high-risk software modules with higher recall and AUC values, which demonstrates the model based on SMOTEBoost has a better ability to estimate a module’s fault-proneness and furthermore improve the efficiency of software testing.

机译：软件故障可能会导致严重的系统错误和故障，从而导致巨大的经济损失。但是目前，没有任何检查和验证技术能够找到并消除所有软件故障。软件测试是检查这些故障并提高软件可靠性的重要方法，但显然这是一项非常昂贵的工作。通过指导高风险模块上的资源分配，对模块的故障倾向性进行评估对于最大限度地减少所需的软件测试资源至关重要。因此，提高了软件测试的效率和软件的可靠性。但是，软件故障数据集最初的分布不平衡。少数软件模块可容纳大多数故障，而大多数模块则无故障。对于软件故障倾向性预测领域的研究人员而言，这种不平衡的数据分配确实是一个挑战。在本文中，我们基于软件指标，通过使用C4.5，SVM，KNN，Logistic，NaiveBayes，AdaBoost和SMOTEBoost对软件易发故障的预测模型进行了研究。我们对从NASA的MDP获得的不平衡软件故障数据集上的这些模型的有效性进行了实证研究。根据实验结果进行全面比较后，SMOTEBoost在预测具有较高召回率和AUC值的高风险软件模块方面显示出优于其他模型的性能，这表明基于SMOTEBoost的模型具有更好的估算模块性能的能力。易于出错，并进一步提高了软件测试的效率。

著录项

来源
《Journal of Computers》 |2014年第3期|共8页
作者
Renqing Li; Shihai Wang;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
SMOTEBoostsoftware fault-proneprediction modelimbalanced data sets;

机译：SMOTEBoostsoftware故障预测模型不平衡数据集;

相似文献

外文文献
中文文献
专利

1. An Empirical Study for Software Fault-Proneness Prediction with Ensemble Learning Models on Imbalanced Data Sets [J] . Renqing Li, Shihai Wang Journal of software . 2014,第3期

机译：基于不平衡数据集集成学习模型的软件故障率预测的实证研究
2. An Empirical Study for Software Fault-Proneness Prediction with Ensemble Learning Models on Imbalanced Data Sets [J] . Renqing Li, Shihai Wang Journal of software . 2014,第3期

机译：基于不平衡数据集的集成学习模型的软件故障率预测的实证研究
3. A guided oversampling technique to improve the prediction of software fault-proneness for imbalanced data [J] . Raed Shatnawi, Ziad Al-Sharif International Journal of Knowledge Engineering and Data Mining . 2012,第2a3期

机译：一种引导式过采样技术，可提高对不平衡数据的软件故障倾向性的预测
4. Comparative Study on Defect Prediction Algorithms of Supervised Learning Software Based on Imbalanced Classification Data Sets [C] . Jianxin Ge, Jiaomin Liu, Wenyuan Liu IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing . 2018

机译：基于不平衡分类数据集的监督学习软件缺陷预测算法的比较研究
5. An empirical study of imputation techniques for software data sets. [D] . Yenduri, Sumanth. 2005

机译：对软件数据集的插补技术的实证研究。
6. An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets [O] . Ana Stanescu, Doina Caragea 2015

机译：基于整体的不平衡拼接位点数据集半监督学习方法的实证研究
7. An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets [O] . 2015

机译：基于集成的不平衡剪接位点数据集的半监督学习方法的实证研究

An Empirical Study for Software Fault-Proneness Prediction with Ensemble Learning Models on Imbalanced Data Sets

摘要

著录项

相似文献

相关主题

期刊订阅