...
首页> 外文期刊>Knowledge-Based Systems >A hierarchical genetic fuzzy system based on genetic programming for addressing classification with highly imbalanced and borderline data-sets
【24h】

A hierarchical genetic fuzzy system based on genetic programming for addressing classification with highly imbalanced and borderline data-sets

机译:基于遗传程序的分层遗传模糊系统,用于处理具有高度不平衡和临界数据集的分类

获取原文
获取原文并翻译 | 示例
           

摘要

Lots of real world applications appear to be a matter of classification with imbalanced data-sets. This problem arises when the number of instances from one class is quite different to the number of instances from the other class. Traditionally, classification algorithms are unable to correctly deal with this issue as they are biased towards the majority class. Therefore, algorithms tend to misclassify the minority class which usually is the most interesting one for the application that is being sorted out. Among the available learning approaches, fuzzy rule-based classification systems have obtained a good behavior in the scenario of imbalanced data-sets. In this work, we focus on some modifications to further improve the performance of these systems considering the usage of information granulation. Specifically, a positive synergy between data sampling methods and algorithmic modifications is proposed, creating a genetic programming approach that uses linguistic variables in a hierarchical way. These linguistic variables are adapted to the context of the problem with a genetic process that combines rule selection with the adjustment of the lateral position of the labels based on the 2-tuples linguistic model. An experimental study is carried out over highly imbalanced and borderline imbalanced data-sets which is completed by a statistical comparative analysis. The results obtained show that the proposed model outperforms several fuzzy rule based classification systems, including a hierarchical approach and presents a better behavior than the C4.5 decision tree.
机译:大量实际应用似乎与数据集不平衡的分类有关。当一个类的实例数与另一类的实例数完全不同时,就会出现此问题。传统上,由于分类算法偏向多数类,因此无法正确处理此问题。因此,算法倾向于对少数类进行错误分类,而少数类通常是正在被分类的应用程序中最有趣的一种。在可用的学习方法中,基于模糊规则的分类系统在数据集不平衡的情况下获得了良好的性能。在这项工作中,我们将重点放在一些修改上,以考虑到信息粒度的使用,从而进一步提高这些系统的性能。具体而言,提出了数据采样方法与算法修改之间的积极协同作用,从而创建了一种以分层方式使用语言变量的遗传编程方法。这些语言变量通过基于2元组语言模型的规则选择与标签横向位置调整相结合的遗传过程而适应问题的背景。对高度不平衡和边界不平衡的数据集进行了实验研究,该研究通过统计比较分析完成。获得的结果表明,所提出的模型优于基于模糊规则的分类系统,包括分层方法,并且比C4.5决策树表现出更好的行为。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号