...
首页> 外文期刊>Knowledge-Based Systems >NearCount: Selecting critical instances based on the cited counts of nearest neighbors
【24h】

NearCount: Selecting critical instances based on the cited counts of nearest neighbors

机译:NearCount:根据引用的最近邻居计数选择关键实例

获取原文
获取原文并翻译 | 示例
           

摘要

Traditional instance selection algorithms are not good at addressing imbalanced problems. Moreover, most of them are sensitive to noise instances and suffer from complex selection rules. To solve these problems, in this paper, we propose a concise learning framework named NearCount to determine the importance of the instance without editing noise. In NearCount, the importance of an instance corresponds to the cited counts. The count is determined by the number of times that one instance is selected as a nearest neighbor of instances in different classes. For the instances with nonzero cited counts, the importance of the instance is inversely proportional to the cited count. To handle classification problems with different data distributions, two detailed NearCount-based algorithms - NearCount-IM and NearCount-IS - are introduced. For imbalanced problems, NearCount-IM selects the important majority instances with an equal number of minority instances, thus balancing the data distribution. For balanced scenarios, NearCount-IS selects the instances whose cited counts are greater than zero and equal or less than the number of nearest neighbors as critical instances in every class. The proposed NearCount-IM and NearCount-IS algorithms are evaluated by comparing them with classical instance selection algorithms on the benchmark data sets. Experiments validate the effectiveness of the proposed algorithms. (C) 2019 Elsevier B.V. All rights reserved.
机译:传统的实例选择算法不能很好地解决不平衡问题。而且,它们中的大多数对噪声实例敏感,并且受复杂的选择规则的影响。为了解决这些问题,在本文中,我们提出了一个名为NearCount的简洁学习框架,用于确定实例的重要性而无需编辑噪声。在NearCount中,实例的重要性与引用的计数相对应。该计数由一个实例被选择为不同类别中实例的最近邻居的次数确定。对于引用计数非零的实例,实例的重要性与引用计数成反比。为了处理具有不同数据分布的分类问题,引入了两种详细的基于NearCount的算法-NearCount-IM和NearCount-IS。对于不平衡问题,NearCount-IM选择具有相同数量的少数实例的重要多数实例,从而平衡数据分布。对于平衡方案,NearCount-IS会选择引用计数大于零且等于或小于最近邻居数的实例作为每个类中的关键实例。通过将它们与基准数据集上的经典实例选择算法进行比较,可以评估所提议的NearCount-IM和NearCount-IS算法。实验验证了所提出算法的有效性。 (C)2019 Elsevier B.V.保留所有权利。

著录项

  • 来源
    《Knowledge-Based Systems》 |2020年第29期|105196.1-105196.17|共17页
  • 作者

  • 作者单位

    East China Univ Sci & Technol Minist Educ Key Lab Adv Control & Optimizat Chem Proc Shanghai 200237 Peoples R China|East China Univ Sci & Technol Dept Comp Sci & Engn Shanghai 200237 Peoples R China;

    East China Univ Sci & Technol Dept Comp Sci & Engn Shanghai 200237 Peoples R China;

    East China Univ Sci & Technol Minist Educ Key Lab Adv Control & Optimizat Chem Proc Shanghai 200237 Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Critical instance; Nearest neighbor; Cited counts; Imbalanced problem; Instance selection;

    机译:关键实例;最近的邻居;引用计数;不平衡的问题;实例选择;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号