Application of LSSVM and SMOTE on Seven Open Source Projects for Predicting Refactoring at Class Level

机译：LSSVM和SMOTE在七个开源项目上的应用在班级预测重构

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Source code refactoring consisting of modifying the structure of the source code without changing its functionality and external behavior. We present a method to predict refactoring candidates at class level which can help developers in improving their design and structure of source code while preserving the behavior. We propose a technique to predict refactoring candidates based on the application of a machine learning based framework. We use Least Squares Support Vector Machines (LS-SVM) as the learning algorithm, Principal Component Analysis (PCA) as a feature extraction technique and Synthetic Minority Over-sampling Technique (SMOTE) as a technique for handling imbalanced data. We start with 102 source code metrics as input features which are then reduced to 31 features after removing irrelevant and redundant features through statistical tests. We conduct a series of experiments on publicly available software engineering dataset consisting of seven open-source software systems in which the refactored classes are manually validated. We apply LS-SVM with three different functions: linear, polynomial and Radial Basis Function (RBF). Statistical significance test demonstrate that RBF kernel outperforms linear and polynomial kernel but there is no statistically significant difference between the performance of linear and polynomial kernel. Statistical significance test reveals that with-SMOTE technique outperforms without-SMOTE and all metrics outperforms PCA based metrics. The mean value of Area Under Curve (AUC) for LS-SVM RBF kernel is 0.96.

机译：源代码重构，包括在不更改其功能和外部行为的情况下修改源代码的结构。我们提出了一种在类级别预测重构候选者的方法，该方法可以帮助开发人员在保留行为的同时改善其源代码的设计和结构。我们提出了一种基于基于机器学习的框架的应用程序来预测重构候选者的技术。我们使用最小二乘支持向量机（LS-SVM）作为学习算法，使用主成分分析（PCA）作为特征提取技术，并使用综合少数族过采样技术（SMOTE）作为处理不平衡数据的技术。我们从102个源代码度量作为输入要素开始，然后通过统计测试删除了不相关和多余的要素，然后将其减少为31个要素。我们对可公开获得的软件工程数据集进行了一系列实验，该数据集由七个开放源代码软件系统组成，其中重构类通过手动验证。我们将LS-SVM应用到三个不同的函数：线性，多项式和径向基函数（RBF）。统计显着性检验表明，RBF内核的性能优于线性和多项式内核，但线性和多项式内核的性能没有统计学上的显着差异。统计显着性测试显示，使用SMOTE技术优于不使用SMOTE的技术，所有指标均优于基于PCA的指标。 LS-SVM RBF内核的曲线下面积（AUC）的平均值为0.96。

著录项

来源
《Asia-Pacific Software Engineering Conference》|2017年|90-99|共10页
会议地点
作者
Lov Kumar; Ashish Sureka;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Measurement; Principal component analysis; Kernel; Software systems; Support vector machines; Feature extraction; Tools;

机译：测量;主成分分析;内核;软件系统;支持向量机;特征提取;工具;

相似文献

外文文献
中文文献
专利

1. Predicting different levels of the unit testing effort of classes using source code metrics: a multiple case study on open-source software [J] . Fadel Toure, Mourad Badri, Luc Lamontagne Innovations in Systems and Software Engineering . 2018,第1期

机译：使用源代码指标预测类的不同层次测试努力：开源软件的多个案例研究
2. Predicting impacts of major projects on housing prices in resource based towns with a case study application to Gladstone, Australia [J] . Delwar Akbar, John Rolfe, S.M. Zobaidul Kabir Resources policy . 2013,第4期

机译：以澳大利亚格拉德斯通为例，预测资源型城镇重大项目对房价的影响
3. A probabilistic model for predicting service level adherence of application support projects [J] . Srijith Sreenivasan, Manimaran Sundaram International journal of productivity and quality management . 2018,第3期

机译：预测应用程序支持项目的服务水平遵循性的概率模型
4. Application of LSSVM and SMOTE on Seven Open Source Projects for Predicting Refactoring at Class Level [C] . Lov Kumar, Ashish Sureka Asia-Pacific Software Engineering Conference . 2017

机译：LSSVM应用于七个开源项目的应用，以预测课程重构的重构
5. Validation of self-reported hypertension status and predictors of uncontrolled blood pressure levels in the Community Initiative to Eliminate Stroke (CITIES) Project [D] . Dave, Gaurav J. 2011

机译：在社区消除中风倡议（CITIES）项目中验证自我报告的高血压状态和血压不受控制的预测因素
6. Multilevel Latent Class Analysis: An Application of Adolescent Smoking Typologies with Individual and Contextual Predictors [O] . Kimberly L. Henry, Bengt Muthén -1

机译：多级潜类分析：与个人和语境预测指标青少年吸烟类型学中的应用
7. SMOTE-NC and gradient boosting imputation based random forest classifier for predicting severity level of covid-19 patients with blood samples [O] . Elif Ceren Gök, Mehmet Onur Olgun 2021

机译：基于粉碎的无随机林分类器的粉碎 - NC和梯度升压归毒性，以预测血液样本的Covid-19患者的严重程度

Application of LSSVM and SMOTE on Seven Open Source Projects for Predicting Refactoring at Class Level

摘要

著录项

相似文献

相关主题

期刊订阅