...
首页> 外文期刊>International journal of machine learning and cybernetics >A memetic approach for training set selection in imbalanced data sets
【24h】

A memetic approach for training set selection in imbalanced data sets

机译:一种在不平衡数据集中选择训练集的模因方法

获取原文
获取原文并翻译 | 示例
           

摘要

Imbalanced data classification is a challenging problem in the field of machine learning. The problem occurs when data samples have an uneven distribution amongst the classes and classical classifiers are not suitable for classifying such datasets. To overcome this problem, in this paper, the best training samples are selected from data samples with the goal of improving the performance of classifier when dealing with imbalanced data. To do so, some heuristic methods are presented which use local information to give a proper view about whether removing or retaining each sample of training set. Subsequently, the methods are considered as local search algorithms and combined with a global search algorithm in a framework to form memetic algorithms. The global search used in this paper is binary quantum inspired gravitational search algorithm (BQIGSA) which is a new metaheuristic search for optimization of binary encoded problems. BQIGSA is employed since we seek for a highly stochastic and random search algorithm to solve our problem. We propose to use six different local search algorithms, three of which are application oriented that we designed based on the problem and the rest are general, and the best local search is then determined. Experiments are performed on 45 standard datasets, and G-mean and AUC criteria are considered as evaluation tools. Then, the data sets are employed to compare the best memetic approaches with some popular state of the art algorithms as well as a recently proposed memetic algorithm and the results show their superiority. At the last step, the performance of the proposed algorithm for four different classifiers is evaluated and the best classifier is determined to be utilized for this method.
机译:数据分类失衡是机器学习领域中一个具有挑战性的问题。当数据样本在类别之间分布不均且经典分类器不适合对此类数据集进行分类时,就会出现问题。为了克服这个问题,本文从数据样本中选择最佳的训练样本,以提高处理不平衡数据时分类器的性能。为此,提出了一些启发式方法,这些方法使用本地信息来提供有关删除还是保留训练集的每个样本的正确视图。随后,这些方法被视为局部搜索算法,并在框架中与全局搜索算法结合以形成模因算法。本文使用的全局搜索是二进制量子启发重力搜索算法(BQIGSA),它是一种用于优化二进制编码问题的新型元启发式搜索。使用BQIGSA是因为我们寻求一种高度随机和随机的搜索算法来解决我们的问题。我们建议使用六种不同的本地搜索算法,其中三种是我们根据问题设计的面向应用程序,其余的都是通用的,然后确定最佳的本地搜索。在45个标准数据集上进行了实验,并且G均值和AUC标准被视为评估工具。然后,使用数据集将最佳模因方法与一些流行的最新算法以及最近提出的模因算法进行比较,结果表明了它们的优越性。在最后一步,评估了所提出算法在四个不同分类器上的性能,并确定了用于该方法的最佳分类器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号