An Improved Method for Cross-Project Defect Prediction by Simplifying Training Data

He Peng; He Yao; Yu Lvjun; Li Bing

首页> 外文期刊>Mathematical Problems in Engineering >An Improved Method for Cross-Project Defect Prediction by Simplifying Training Data

【24h】

An Improved Method for Cross-Project Defect Prediction by Simplifying Training Data

机译：一种简化训练数据的跨项目缺陷预测的改进方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Cross-project defect prediction (CPDP) on projects with limited historical data has attracted much attention. To the best of our knowledge, however, the performance of existing approaches is usually poor, because of low quality cross-project training data. The objective of this study is to propose an improved method for CPDP by simplifying training data, labeled as TDSelector, which considers both the similarity and the number of defects that each training instance has (denoted by defects), and to demonstrate the effectiveness of the proposed method. Our work consists of three main steps. First, we constructed TDSelector in terms of a linear weighted function of instances' similarity and defects. Second, the basic defect predictor used in our experiments was built by using the Logistic Regression classification algorithm. Third, we analyzed the impacts of different combinations of similarity and the normalization of defects on prediction performance and then compared with two existing methods. We evaluated our method on 14 projects collected from two public repositories. The results suggest that the proposed TDSelector method performs, on average, better than both baseline methods, and the AUC values are increased by up to 10.6% and 4.3%, respectively. That is, the inclusion of defects is indeed helpful to select high quality training instances for CPDP. On the other hand, the combination of Euclidean distance and linear normalization is the preferred way for TDSelector. An additional experiment also shows that selecting those instances with more bugs directly as training data can further improve the performance of the bug predictor trained by our method.

机译：对于历史数据有限的项目，跨项目缺陷预测（CPDP）引起了广泛关注。据我们所知，由于低质量的跨项目培训数据，现有方法的性能通常很差。这项研究的目的是通过简化训练数据（称为TDSelector）来提出一种改进的CPDP方法，该方法考虑了每个训练实例所具有的相似性和缺陷数量（以缺陷表示），并证明了这种方法的有效性。建议的方法。我们的工作包括三个主要步骤。首先，我们根据实例的相似性和缺陷的线性加权函数构造TDSelector。其次，我们使用Logistic回归分类算法构建了我们实验中使用的基本缺陷预测器。第三，我们分析了相似度和缺陷归一化的不同组合对预测性能的影响，然后与两种现有方法进行了比较。我们评估了从两个公共存储库收集的14个项目的方法。结果表明，所提出的TDSelector方法的平均性能优于两种基线方法，并且AUC值分别增加了10.6％和4.3％。也就是说，包含缺陷确实有助于选择CPDP的高质量训练实例。另一方面，欧氏距离和线性归一化的组合是TDSelector的首选方法。另一个实验还表明，直接选择包含更多错误的实例作为训练数据可以进一步提高通过我们的方法训练的错误预测器的性能。

著录项

来源
《Mathematical Problems in Engineering》 |2018年第6期|2650415.1-2650415.18|共18页
作者
He Peng; He Yao; Yu Lvjun; Li Bing;
展开▼
作者单位

Hubei Univ, Sch Comp Sci & Informat Engn, Wuhan 430062, Hubei, Peoples R China;

Hubei Univ, Sch Comp Sci & Informat Engn, Wuhan 430062, Hubei, Peoples R China;

Hubei Univ, Sch Comp Sci & Informat Engn, Wuhan 430062, Hubei, Peoples R China;

Wuhan Univ, Sch Comp, Wuhan 430072, Hubei, Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. An Improved Method for Cross-Project Defect Prediction by Simplifying Training Data [J] . Peng He, Yao He, Lvjun Yu, Mathematical Problems in Engineering: Theory, Methods and Applications . 2018,第a期

机译：通过简化训练数据的交叉项目缺陷预测的改进方法
2. Cross-version defect prediction: use historical data, cross-project data, or both? [J] . Sousuke Amasaki Empirical Software Engineering . 2020,第2期

机译：跨版本缺陷预测：使用历史数据，跨项目数据，还是同时使用两者？
3. An Investigation of Imbalanced Ensemble Learning Methods for Cross-Project Defect Prediction [J] . Qiu Shaojian, Lu Lu, Jiang Siyu, International Journal of Pattern Recognition and Artificial Intelligence . 2019,第12期

机译：跨项目缺陷预测中不平衡集成学习方法的研究
4. Improving Cross-Project Defect Prediction Methods with Data Simplification [C] . Amasaki Sousuke, Kawata Kazuya, Yokogawa Tomoyuki Euromicro Conference on Software Engineering and Advanced Applications . 2015

机译：用数据简化改进交叉项目缺陷预测方法
5. Heuristic and self -training methods for improving gene prediction in prokaryotes [D] . Besemer, John David 2003

机译：用于改进原核生物基因预测的启发式和自训练方法
6. Comparisons of improved genomic predictions generated by different imputation methods for genotyping by sequencing data in livestock populations [O] . Xiao Wang, Guosheng Su, Dan Hao, 2020

机译：比较不同插补方法为牲畜群体测序数据进行基因分型的基因组预测的改进
7. An Improved Method for Cross-Project Defect Prediction by Simplifying Training Data [O] . Peng He, Yao He, Lvjun Yu, 2018

机译：通过简化训练数据的交叉项目缺陷预测的改进方法

An Improved Method for Cross-Project Defect Prediction by Simplifying Training Data

摘要

著录项

相似文献

相关主题

期刊订阅