首页> 外文期刊>Soft computing: A fusion of foundations, methodologies and applications >Binary teaching-learning-based optimization algorithm with a new update mechanism for sample subset optimization in software defect prediction
【24h】

Binary teaching-learning-based optimization algorithm with a new update mechanism for sample subset optimization in software defect prediction

机译:基于二元教学 - 基于教学的优化算法,具有软件缺陷预测中的样本子集优化的新更新机制

获取原文
获取原文并翻译 | 示例
           

摘要

Software defect prediction has gained considerable attention in recent years. A broad range of computational methods has been developed for accurate prediction of faulty modules based on code and design metrics. One of the challenges in training classifiers is the highly imbalanced class distribution in available datasets, leading to an undesirable bias in the prediction performance for the minority class. Data sampling is a widespread technique to tackle this problem. However, traditional sampling methods, which depend mainly on random resampling from a given dataset, do not take advantage of useful information available in training sets, such as sample quality and representative instances. To cope with this limitation, evolutionary undersampling methods are usually used for identifying an optimal sample subset for the training dataset. This paper proposes a binary teaching-learning- based optimization algorithm employing a distribution-based solution update rule, namely BTLBOd, to generate a balanced subset of highly valuable examples. This subset is then applied to train a classifier for reliable prediction of potentially defective modules in a software system. Each individual in BTLBOd includes two vectors: a real-valued vector generated by the distribution-based update mechanism, and a binary vector produced from the corresponding real vector by a proposed mapping function. Empirical results showed that the optimal sample subset produced by BTLBOd might ameliorate the classification accuracy of the predictor on highly imbalanced software defect data. Obtained results also demonstrated the superior performance of the proposed sampling method compared to other popular sampling techniques.
机译:近年来,软件缺陷预测已得到相当大的关注。已经开发了广泛的计算方法,用于基于代码和设计度量的故障模块精确预测。培训分类器中的一个挑战是可用数据集中的高度不平衡的类分布,导致少数阶级预测性能的不良偏见。数据采样是一种解决这个问题的广泛技术。然而,传统的采样方法主要取决于来自给定数据集的随机重新采样,不利用培训集中可用的有用信息,例如样本质量和代表实例。为了应对这种限制,进化的下采样方法通常用于识别训练数据集的最佳样本子集。本文提出了一种采用基于分布的解决方案更新规则,即BTLBOD的基于二元教学的优化算法,以产生高度有价值的例子的平衡子集。然后应用该子集以训练用于在软件系统中的潜在缺陷模块的可靠预测的分类器。 BTLBOD中的每个单独包括两个向量:由基于分布的更新机制生成的实值矢量,以及由所提出的映射函数由相应的实际矢量产生的二进制向量。实证结果表明,BTLBOD产生的最佳样本子集可能会改善预测器的分类准确性对高度不平衡的软件缺陷数据。获得的结果还表明,与其他流行的采样技术相比,所提出的采样方法的优异性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号