首页> 外文会议>Asia-Pacific Software Engineering Conference >Application of LSSVM and SMOTE on Seven Open Source Projects for Predicting Refactoring at Class Level
【24h】

Application of LSSVM and SMOTE on Seven Open Source Projects for Predicting Refactoring at Class Level

机译:LSSVM和SMOTE在七个开源项目上的应用在班级预测重构

获取原文

摘要

Source code refactoring consisting of modifying the structure of the source code without changing its functionality and external behavior. We present a method to predict refactoring candidates at class level which can help developers in improving their design and structure of source code while preserving the behavior. We propose a technique to predict refactoring candidates based on the application of a machine learning based framework. We use Least Squares Support Vector Machines (LS-SVM) as the learning algorithm, Principal Component Analysis (PCA) as a feature extraction technique and Synthetic Minority Over-sampling Technique (SMOTE) as a technique for handling imbalanced data. We start with 102 source code metrics as input features which are then reduced to 31 features after removing irrelevant and redundant features through statistical tests. We conduct a series of experiments on publicly available software engineering dataset consisting of seven open-source software systems in which the refactored classes are manually validated. We apply LS-SVM with three different functions: linear, polynomial and Radial Basis Function (RBF). Statistical significance test demonstrate that RBF kernel outperforms linear and polynomial kernel but there is no statistically significant difference between the performance of linear and polynomial kernel. Statistical significance test reveals that with-SMOTE technique outperforms without-SMOTE and all metrics outperforms PCA based metrics. The mean value of Area Under Curve (AUC) for LS-SVM RBF kernel is 0.96.
机译:源代码重构,包括在不更改其功能和外部行为的情况下修改源代码的结构。我们提出了一种在类级别预测重构候选者的方法,该方法可以帮助开发人员在保留行为的同时改善其源代码的设计和结构。我们提出了一种基于基于机器学习的框架的应用程序来预测重构候选者的技术。我们使用最小二乘支持向量机(LS-SVM)作为学习算法,使用主成分分析(PCA)作为特征提取技术,并使用综合少数族过采样技术(SMOTE)作为处理不平衡数据的技术。我们从102个源代码度量作为输入要素开始,然后通过统计测试删除了不相关和多余的要素,然后将其减少为31个要素。我们对可公开获得的软件工程数据集进行了一系列实验,该数据集由七个开放源代码软件系统组成,其中重构类通过手动验证。我们将LS-SVM应用到三个不同的函数:线性,多项式和径向基函数(RBF)。统计显着性检验表明,RBF内核的性能优于线性和多项式内核,但线性和多项式内核的性能没有统计学上的显着差异。统计显着性测试显示,使用SMOTE技术优于不使用SMOTE的技术,所有指标均优于基于PCA的指标。 LS-SVM RBF内核的曲线下面积(AUC)的平均值为0.96。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号