首页> 外文期刊>IFAC PapersOnLine >A Noisy-sample-removed Under-sampling Scheme for Imbalanced Classification of Public Datasets
【24h】

A Noisy-sample-removed Under-sampling Scheme for Imbalanced Classification of Public Datasets

机译:用于公共数据集的不平衡分类的嘈杂 - 样本删除的下采样方案

获取原文
           

摘要

Classification technology plays an important role in machine learning. In the process of classification, the presence of noisy samples in datasets tends to reduce the performance of a classifier. This work proposes a clustering-based Noisy-sample-Removed Under-sampling Scheme (NUS) for imbalanced classification. First, the samples in the minority class are clustered. For each cluster, its center is taken as a spherical center, and the distance of the minority class samples farthest from the cluster center is taken as the radius to form a hypersphere. The Euclidean distance from the center of the cluster to every of the majority samples is calculated to decide if they are in the hypersphere. Then, we propose a NUS-based policy to decide if a majority sample in the hypersphere is a noisy sample. Similarly, the noises samples of the minority class are found. Second, We remove noisy-samples from the majority and minority classes and propose NUS. Finally, logistics regression, Decision Tree, and Random Forest are used in NUS as the base classifiers, respectively and compare with Random Under-Sampling (RUS), EasyEnsemble (EE), and Inverse Random Under-Sampling (IRUS) on 13 public datasets. Results show that our method can improve the classification performance in comparison with its state-of-the art peers.
机译:分类技术在机器学习中起着重要作用。在分类过程中,数据集中的噪声样本的存在倾向于降低分类器的性能。这项工作提出了基于聚类的噪声除样品删除的下式抽样方案(NUS),用于不平衡分类。首先,少数群体类中的样本被聚集。对于每个集群,其中心被视为球形中心,并且少数级别样本远离群集中心的距离被视为形成长度的半径。计算距离集群中心到所有多数样本的欧几里德距离,以确定它们是否处于间隔。然后,我们提出了基于NUS的政策来决定极度的大多数样本是否是嘈杂的样本。同样,找到少数阶级类的噪声样本。其次,我们从大多数和少数群体和少数群体课程中删除嘈杂的样本,并提出Nus。最后,物流回归,决策树和随机林分别用于基本分类器,并与随机欠采样(RUS),EasySenemble(EE)以及13公共数据集的反常随机下采样(IRU)进行比较。结果表明,与其最先进的同行相比,我们的方法可以提高分类性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号