首页> 外文期刊>Arabian Journal for Science and Engineering. Section A, Sciences >HCAB‑SMOTE: A Hybrid Clustered Affinitive Borderline SMOTE Approach for Imbalanced Data Binary Classification
【24h】

HCAB‑SMOTE: A Hybrid Clustered Affinitive Borderline SMOTE Approach for Imbalanced Data Binary Classification

机译:HCAB‑SMOTE:一种用于不平衡数据二进制分类的混合聚类相似边界SMOTE方法

获取原文
获取原文并翻译 | 示例
           

摘要

Binary datasets are considered imbalanced when one of their two classes has less than 40% of the total number of the datainstances (i.e., minority class). Existing classification algorithms are biased when applied on imbalanced binary datasets,as they misclassify instances of minority class. Many techniques are proposed to minimize the bias and to increase the classificationaccuracy. Synthetic Minority Oversampling Technique (SMOTE) is a well-known approach proposed to addressthis problem. It generates new synthetic data instances to balance the dataset. Unfortunately, it generates these instancesrandomly, leading to the generation of useless new instances, which is time and memory consuming. Different SMOTEderivatives were proposed to overcome this problem (such as Borderline SMOTE), yet the number of generated instancesslightly changed. To overcome such problem, this paper proposes a novel approach for generating synthesized data instancesknown as Hybrid Clustered Affinitive Borderline SMOTE (HCAB-SMOTE). It managed to minimize the number of generatedinstances while increasing the classification accuracy. It combines undersampling for removing majority noise instancesand oversampling approaches to enhance the density of the borderline. It uses k-means clustering on the borderline area andidentify which clusters to oversample to achieve better results. Experimental results show that HCAB-SMOTE outperformedSMOTE, Borderline SMOTE, AB-SMOTE and CAB-SMOTE approaches which were developed before reaching HCABSMOTE,as it provided the highest classification accuracy with the least number of generated instances.
机译:当二进制数据集的两个类别之一少于数据实例总数的40%(即少数类别)时,则认为二进制数据集是不平衡的。现有分类算法应用于不平衡的二进制数据集时会产生偏差,因为它们对少数类的实例进行了错误分类。提出了许多技术来最小化偏差并提高分类精度。综合少数群体过采样技术(SMOTE)是为解决此问题而提出的一种众所周知的方法。它生成新的合成数据实例以平衡数据集。不幸的是,它随机生成了这些实例,导致生成了无用的新实例,这浪费了时间和内存。为了克服这个问题,人们提出了不同的SMOTE衍生物(例如Borderline SMOTE),但是生成实例的数量略有变化。为了克服这个问题,本文提出了一种用于生成合成数据实例的新方法,称为混合群集仿射边界线SMOTE(HCAB-SMOTE)。它设法最小化了生成实例的数量,同时提高了分类精度。它结合了用于消除多数噪声实例的欠采样和过采样方法来增强边界线的密度。它在边界线上使用k均值聚类,并确定对哪些聚类进行过采样以获得更好的结果。实验结果表明,HCAB-SMOTE优于在到达HCABSMOTE之前开发的SMOTE,Borderline SMOTE,AB-SMOTE和CAB-SMOTE方法,因为它提供了最高的分类精度,生成的实例数量最少。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号