首页> 外文期刊>Technical Gazette >A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering
【24h】

A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering

机译:基于密度峰聚类的非平数据集新型过采样方法

获取原文
           

摘要

Imbalanced data classification is a major challenge in the field of data mining and machine learning, and oversampling algorithms are a widespread technique for re-sampling imbalanced data. To address the problems that existing oversampling methods tend to introduce noise points and generate overlapping instances, in this paper, we propose a novel oversampling method based on density peaks clustering. Firstly, density peaks clustering algorithm is used to cluster minority instances while screening outlier points. Secondly, sampling weights are assigned according to the size of clustered sub-clusters, and new instances are synthesized by interpolating between cluster cores and other instances of the same sub-cluster. Finally, comparative experiments are conducted on both the artificial data and KEEL datasets. The experiments validate the feasibility and effectiveness of the algorithm and improve the classification accuracy of the imbalanced data.
机译:数据分类的不平衡数据分类是数据挖掘和机器学习领域的主要挑战,并且过采样算法是用于重新采样不平衡数据的广泛技术。 为了解决现有过采样方法倾向于引入噪声点并生成重叠实例的问题,我们提出了一种基于密度峰聚类的新型过采样方法。 首先,密度峰值聚类算法用于在筛选异常点时纳入少数群体实例。 其次,根据群集子集群的大小分配采样权重,通过在群集核和同一子簇的其他实例之间插入来合成新实例。 最后,对比较实验在人工数据和龙骨数据集上进行。 实验验证了算法的可行性和有效性,提高了不平衡数据的分类准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号