A memetic approach for training set selection in imbalanced data sets

Nikpour Bahareh; Nezamabadi-pour Hossein

首页> 外文期刊>International journal of machine learning and cybernetics >A memetic approach for training set selection in imbalanced data sets

【24h】

A memetic approach for training set selection in imbalanced data sets

机译：一种在不平衡数据集中选择训练集的模因方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Imbalanced data classification is a challenging problem in the field of machine learning. The problem occurs when data samples have an uneven distribution amongst the classes and classical classifiers are not suitable for classifying such datasets. To overcome this problem, in this paper, the best training samples are selected from data samples with the goal of improving the performance of classifier when dealing with imbalanced data. To do so, some heuristic methods are presented which use local information to give a proper view about whether removing or retaining each sample of training set. Subsequently, the methods are considered as local search algorithms and combined with a global search algorithm in a framework to form memetic algorithms. The global search used in this paper is binary quantum inspired gravitational search algorithm (BQIGSA) which is a new metaheuristic search for optimization of binary encoded problems. BQIGSA is employed since we seek for a highly stochastic and random search algorithm to solve our problem. We propose to use six different local search algorithms, three of which are application oriented that we designed based on the problem and the rest are general, and the best local search is then determined. Experiments are performed on 45 standard datasets, and G-mean and AUC criteria are considered as evaluation tools. Then, the data sets are employed to compare the best memetic approaches with some popular state of the art algorithms as well as a recently proposed memetic algorithm and the results show their superiority. At the last step, the performance of the proposed algorithm for four different classifiers is evaluated and the best classifier is determined to be utilized for this method.

机译：数据分类失衡是机器学习领域中一个具有挑战性的问题。当数据样本在类别之间分布不均且经典分类器不适合对此类数据集进行分类时，就会出现问题。为了克服这个问题，本文从数据样本中选择最佳的训练样本，以提高处理不平衡数据时分类器的性能。为此，提出了一些启发式方法，这些方法使用本地信息来提供有关删除还是保留训练集的每个样本的正确视图。随后，这些方法被视为局部搜索算法，并在框架中与全局搜索算法结合以形成模因算法。本文使用的全局搜索是二进制量子启发重力搜索算法（BQIGSA），它是一种用于优化二进制编码问题的新型元启发式搜索。使用BQIGSA是因为我们寻求一种高度随机和随机的搜索算法来解决我们的问题。我们建议使用六种不同的本地搜索算法，其中三种是我们根据问题设计的面向应用程序，其余的都是通用的，然后确定最佳的本地搜索。在45个标准数据集上进行了实验，并且G均值和AUC标准被视为评估工具。然后，使用数据集将最佳模因方法与一些流行的最新算法以及最近提出的模因算法进行比较，结果表明了它们的优越性。在最后一步，评估了所提出算法在四个不同分类器上的性能，并确定了用于该方法的最佳分类器。

著录项

来源
《International journal of machine learning and cybernetics》 |2019年第11期|3043-3070|共28页
作者
Nikpour Bahareh; Nezamabadi-pour Hossein;
展开▼
作者单位

Shahid Bahonar Univ Kerman Dept Elect Engn IDPL Kerman Iran|Shahid Bahonar Univ Kerman Mahani Math Res Ctr Kerman Iran;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Imbalanced data; Under-sampling methods; Training set selection; Metaheuristics; Memetic algorithms; Binary quantum-inspired gravitational search algorithm;

机译：数据不平衡;欠采样方法;训练集选择;元启发法;模因算法;二进制量子启发重力搜索算法;

相似文献

外文文献
中文文献
专利

1. SMOTE-RSB_*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory [J] . Enislay Ramentol, Yaile Caballero, Rafael Bello, Knowledge and information systems . 2012,第2期

机译：SMOTE-RSB_ *：使用SMOTE和粗糙集理论的基于过采样和欠采样的混合预处理方法，用于高不平衡数据集
2. SMOTE-RSB *: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory [J] . Enislay Ramentol, Yailé Caballero, Rafael Bello, Knowledge and Information Systems . 2012,第2期

机译：SMOTE-RSB * ：一种基于过采样和欠采样的混合预处理方法，使用SMOTE和粗糙集理论处理高不平衡数据集
3. Enhancing the effectiveness and interpretability of decision tree and rule induction classifiers with evolutionary training set selection over imbalanced problems [J] . Salvador Garcia, Alberto Fernandez, Francisco Herrera Applied Soft Computing . 2009,第4期

机译：通过针对不平衡问题的进化训练集选择来提高决策树和规则归纳分类器的有效性和可解释性
4. Editing Training Sets from Imbalanced Data Using Fuzzy-Rough Sets [C] . Do Van Nguyen, Keisuke Ogawa, Kazunori Matsumoto, IFIP WG 12.5 International Conference on artificial intelligence applications and innovations . 2015

机译：使用模糊粗糙集从不平衡数据中编辑训练集
5. Fault Detection Framework for Imbalanced and Sparsely-Labeled Data Sets Using Self-Organizing Maps [D] . Shah, Rushit N. 2018

机译：使用自组织地图的Imbalanced和Sparars标记的数据集故障检测框架
6. Peculiar Genes Selection: A new features selection method to improve classification performances in imbalanced data sets [O] . Federica Martina, Marco Beccuti, Gianfranco Balbo, -1

机译：特殊基因选择：一种新的特征选择方法可改善不平衡数据集中的分类性能
7. Editing Training Sets from Imbalanced Data Using Fuzzy-Rough Sets [O] . Nguyen, Do,, Ogawa, Keisuke, Matsumoto, Kazunori, 2015

机译：使用模糊粗糙集从不平衡数据中编辑训练集
8. Fuzzy sets, rough sets, and modeling evidence: Theory and Application. A Dempster-Shafer based approach to compromise decision making with multiattributes applied to product selection [R] . Dekorvin, Andre 1992

机译：模糊集，粗糙集和建模证据：理论与应用。一种基于Dempster-shafer的方法，用于在产品选择中使用多属性来决策制定

A memetic approach for training set selection in imbalanced data sets

摘要

著录项

相似文献

相关主题

期刊订阅