A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering

Cao Jie; Shi Yong

首页> 外文期刊>Technical Gazette >A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering

【24h】

A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering

机译：基于密度峰聚类的非平数据集新型过采样方法

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Imbalanced data classification is a major challenge in the field of data mining and machine learning, and oversampling algorithms are a widespread technique for re-sampling imbalanced data. To address the problems that existing oversampling methods tend to introduce noise points and generate overlapping instances, in this paper, we propose a novel oversampling method based on density peaks clustering. Firstly, density peaks clustering algorithm is used to cluster minority instances while screening outlier points. Secondly, sampling weights are assigned according to the size of clustered sub-clusters, and new instances are synthesized by interpolating between cluster cores and other instances of the same sub-cluster. Finally, comparative experiments are conducted on both the artificial data and KEEL datasets. The experiments validate the feasibility and effectiveness of the algorithm and improve the classification accuracy of the imbalanced data.

机译：数据分类的不平衡数据分类是数据挖掘和机器学习领域的主要挑战，并且过采样算法是用于重新采样不平衡数据的广泛技术。为了解决现有过采样方法倾向于引入噪声点并生成重叠实例的问题，我们提出了一种基于密度峰聚类的新型过采样方法。首先，密度峰值聚类算法用于在筛选异常点时纳入少数群体实例。其次，根据群集子集群的大小分配采样权重，通过在群集核和同一子簇的其他实例之间插入来合成新实例。最后，对比较实验在人工数据和龙骨数据集上进行。实验验证了算法的可行性和有效性，提高了不平衡数据的分类准确性。

著录项

来源
《Technical Gazette》 |2021年第6期|共7页
作者
Cao Jie; Shi Yong;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类一般工业技术;
关键词
classificationdensity peaks clusteringimbalanced datasetsover sampling;

机译：ClassificationDenty Peaks ClusteringImbalanced DataSetsover采样;

相似文献

外文文献
中文文献
专利

1. Adaptive weighted over-sampling for imbalanced datasets based on density peaks clustering with heuristic filtering [J] . Information Sciences: An International Journal . 2020,第期

机译：基于密度峰值聚类与启发式滤波的自适应加权过度采样
2. Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification [J] . Jinyan Li, Simon Fong, Yunsick Sung, BioData Mining . 2016,第1期

机译：生物医学数据分类中基于二元不平衡数据集的自适应群聚动态多目标合成少数过采样技术算法
3. Fuzzy–synthetic minority oversampling technique: Oversampling based on fuzzy set theory for Android malware detection in imbalanced datasets [J] . Yanping Xu, Chunhua Wu, Kangfeng Zheng, International Journal of Distributed Sensor Networks . 2017,第4期

机译：模糊综合少数群体过采样技术：基于模糊集理论的过采样用于不平衡数据集中的Android恶意软件检测
4. Density-induced oversampling for highly imbalanced datasets [C] . Daniel Fecker, Volker Maergner, Tim Fingscheidt Image processing: machine vision applications VI . 2013

机译：高度不平衡数据集的密度引起的过采样
5. Active learning with support vector machines for imbalanced datasets and a method for stopping active learning based on stabilizing predictions. [D] . Bloodgood, Michael. 2009

机译：支持向量机用于不平衡数据集的主动学习，以及一种基于稳定预测的主动学习停止方法。
6. Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification [O] . Jinyan Li, Simon Fong, Yunsick Sung, 2016

机译：生物医学数据分类中基于二元不平衡数据集的自适应群聚动态多目标综合少数抽样技术算法
7. Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification [O] . 2016

机译：生物医学数据分类中基于二元不平衡数据集的自适应群聚动态多目标综合少数抽样技术算法

A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering

摘要

著录项

相似文献

相关主题

期刊订阅