首页> 外文期刊>Frontiers of computer science >Transfer synthetic over-sampling for class-imbalance learning with limited minority class data
【24h】

Transfer synthetic over-sampling for class-imbalance learning with limited minority class data

机译:使用有限的少数级别数据转移类别不平衡学习的综合性过度抽样

获取原文
获取原文并翻译 | 示例
           

摘要

The problem of limited minority class data is encountered in many class imbalanced applications, but has received little attention. Synthetic over-sampling, as popular class-imbalance learning methods, could introduce much noise when minority class has limited data since the synthetic samples are not i.i.d. samples of minority class. Most sophisticated synthetic sampling methods tackle this problem by denoising or generating samples more consistent with ground-truth data distribution. But their assumptions about true noise or ground-truth data distribution may not hold. To adapt synthetic sampling to the problem of limited minority class data, the proposed Traso framework treats synthetic minority class samples as an additional data source, and exploits transfer learning to transfer knowledge from them to minority class. As an implementation, TrasoBoost method firstly generates synthetic samples to balance class sizes. Then in each boosting iteration, the weights of synthetic samples and original data decrease and increase respectively when being misclassified, and remain unchanged otherwise. The misclassified synthetic samples are potential noise, and thus have smaller influence in the following iterations. Besides, the weights of minority class instances have greater change than those of majority class instances to be more influential. And only original data are used to estimate error rate to be immune from noise. Finally, since the synthetic samples are highly related to minority class, all of the weak learners are aggregated for prediction. Experimental results show TrasoBoost outperforms many popular class-imbalance learning methods.
机译:在许多类的不平衡应用程序中遇到了有限的少数级别数据的问题,但收到了很少的关注。合成过采样,作为流行的类别不平衡学习方法,当少数群体类数据有限时,可以引入很多噪音,因为合成样本不是i.i.d。少数民族类别的样本。大多数复杂的合成采样方法通过去噪或产生与地面真实数据分布一致的样品来解决这个问题。但他们对真实噪声或地面真实数据分布的假设可能不会持有。为了使合成采样适应有限的少数群体数据数据的问题,所提出的Traso框架将合成少数群体类样本视为额外的数据源,并利用转移学习将知识转移到少数阶级。作为实现,Trasoboost方法首先生成合成样本以平衡类尺寸。然后在每个升压迭代中,在错误分类时,合成样本的权重和原始数据的重量分别减小和增加,否则保持不变。错误分类的合成样品是潜在的噪声,因此在以下迭代中具有较小的影响。此外,少数群体实例的权重比多数阶级实例更具影响力的变化更大。并且只使用原始数据来估计误差率免受噪声。最后,由于合成样本与少数阶级高度相关,因此所有弱学习者都会被汇总以供预测。实验结果表明,Trasoboost优于许多流行的类别 - 不平衡学习方法。

著录项

  • 来源
    《Frontiers of computer science》 |2019年第5期|996-1009|共14页
  • 作者单位

    Southeast Univ Sch Comp Sci & Engn Nanjing 210096 Jiangsu Peoples R China|Southeast Univ Minist Educ Key Lab Comp Network & Informat Integrat Nanjing 210096 Jiangsu Peoples R China|Collaborat Innovat Ctr Wireless Commun Technol Nanjing 210096 Jiangsu Peoples R China;

    Southeast Univ Sch Comp Sci & Engn Nanjing 210096 Jiangsu Peoples R China|Southeast Univ Minist Educ Key Lab Comp Network & Informat Integrat Nanjing 210096 Jiangsu Peoples R China|Collaborat Innovat Ctr Wireless Commun Technol Nanjing 210096 Jiangsu Peoples R China;

    Southeast Univ Sch Comp Sci & Engn Nanjing 210096 Jiangsu Peoples R China|Southeast Univ Minist Educ Key Lab Comp Network & Informat Integrat Nanjing 210096 Jiangsu Peoples R China|Collaborat Innovat Ctr Wireless Commun Technol Nanjing 210096 Jiangsu Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    machine learning; data mining; class imbalance; over sampling; boosting; transfer learning;

    机译:机器学习;数据挖掘;类别不平衡;通过抽样;提升;转移学习;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号