...
首页> 外文期刊>Connection Science >A weighted pattern matching approach for classification of imbalanced data with a fireworks-based algorithm for feature selection
【24h】

A weighted pattern matching approach for classification of imbalanced data with a fireworks-based algorithm for feature selection

机译:一种加权模式匹配方法,用于分类基于Fireworks的特征选择算法

获取原文
获取原文并翻译 | 示例
           

摘要

Learning a classifier from imbalanced data is a challenging problem in Machine learning. A dataset is said to be imbalanced when the number of instances belonging to one class is much less than the number of instances belonging to the other class. Classifiers that proves efficient on standard data fail when the data is imbalanced as they are over trained by the majority class instances. Since class imbalance is a common characteristic of real-world data, the need for better classifiers becomes essential. This paper proposes a novel instance-based classification algorithm called Weighted Pattern Matching based Classification (PMC+) for classifying imbalanced data. PMC+ classifies unlabelled instances by computing the absolute difference between the feature values of the instances in the dataset and the unlabelled instance. PMC+ employs a simple classification procedure with weights and shows reasonably good performance. To improve the performance of PMC+, Fireworks based Feature and Weight Selection algorithm based on the idea of PMC+ has been proposed. PMC+ is evaluated on 44 binary imbalanced datasets and 15 multiclass imbalanced datasets. Although PMC+ does not employ a resampling or cost-sensitive method, experiments show that PMC+ is effective for classification of imbalanced data. The results of the experiments were validated using various non-parametric statistical tests.
机译:学习从不平衡数据的分类器是机器学习中的一个具有挑战性的问题。当属于一个类的实例数量远小于属于另一类类的实例数量时,据说数据集是不平衡的。当数据不平衡时,在标准数据上验证的分类器会失败,因为它们是由大多数类实例培训的数据而失败。由于类不平衡是真实数据的共同特征,因此对更好的分类器的需要变得必不可少。本文提出了一种基于实例的基于实例的分类算法,称为基于加权模式匹配的基于类别(PMC +),用于分类不平衡数据。 PMC +通过计算DataSet中实例的特征值与未标记实例之间的绝对差异来分类未标记的实例。 PMC +采用简单的分类程序,重量并显示出合理的性能。提出了提高PMC +,基于PMC +思想的基于PMC +,基于烟花的特征和权重选择算法的性能。 PMC +在44个二进制不平衡数据集和15个多字符上进行评估。虽然PMC +不采用重采样或成本敏感的方法,但实验表明PMC +对分类的不平衡数据是有效的。使用各种非参数统计测试验证了实验结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号