首页> 外文期刊>Frontiers of computer science in China >Transfer synthetic over-sampling for class-imbalance learning with limited minority class data
【24h】

Transfer synthetic over-sampling for class-imbalance learning with limited minority class data

机译:传输合成过采样,以较少的少数班级数据进行班级不平衡学习

获取原文
获取原文并翻译 | 示例
           

摘要

The problem of limited minority class data is encountered in many class imbalanced applications, but has received little attention. Synthetic over-sampling, as popular class-imbalance learning methods, could introduce much noise when minority class has limited data since the synthetic samples are not i.i.d. samples of minority class. Most sophisticated synthetic sampling methods tackle this problem by denoising or generating samples more consistent with ground-truth data distribution. But their assumptions about true noise or ground-truth data distribution may not hold. To adapt synthetic sampling to the problem of limited minority class data, the proposed Traso framework treats synthetic minority class samples as an additional data source, and exploits transfer learning to transfer knowledge from them to minority class. As an implementation, TrasoBoost method firstly generates synthetic samples to balance class sizes. Then in each boosting iteration, the weights of synthetic samples and original data decrease and increase respectively when being misclassified, and remain unchanged otherwise. The misclassified synthetic samples are potential noise, and thus have smaller influence in the following iterations. Besides, the weights of minority class instances have greater change than those of majority class instances to be more influential. And only original data are used to estimate error rate to be immune from noise. Finally, since the synthetic samples are highly related to minority class, all of the weak learners are aggregated for prediction. Experimental results show TrasoBoost outperforms many popular class-imbalance learning methods.
机译:在许多班级不平衡的应用程序中遇到少数班级数据有限的问题,但很少引起注意。作为流行的班级不平衡学习方法,合成过采样可能会在少数班级数据有限时引入很多噪声,因为合成样本不是i.i.d.少数族裔样本。最先进的合成采样方法通过对样本进行降噪或生成更符合地面真实数据分布的样本来解决此问题。但是他们关于真实噪声或真实数据分布的假设可能不成立。为了使合成采样适应少数族裔类别数据的局限性,提出的Traso框架将合成少数族裔样本作为附加数据源,并利用转移学习将知识从它们转移到少数族裔类别。作为一种实现,TrasoBoost方法首先生成合成样本以平衡类大小。然后,在每个增强迭代中,合成样本和原始数据的权重在分类错误时分别减小和增加,否则保持不变。错误分类的合成样本是潜在噪声,因此在后续迭代中影响较小。此外,少数群体实例的权重变化比多数群体实例的权重更大,从而更具影响力。并且仅原始数据用于估计错误率以免受噪声影响。最后,由于合成样本与少数族裔高度相关,因此将所有弱学习者聚集在一起进行预测。实验结果表明TrasoBoost优于许多流行的类不平衡学习方法。

著录项

  • 来源
    《Frontiers of computer science in China》 |2019年第5期|996-1009|共14页
  • 作者单位

    Southeast Univ, Sch Comp Sci & Engn, Nanjing 210096, Jiangsu, Peoples R China|Southeast Univ, Minist Educ, Key Lab Comp Network & Informat Integrat, Nanjing 210096, Jiangsu, Peoples R China|Collaborat Innovat Ctr Wireless Commun Technol, Nanjing 210096, Jiangsu, Peoples R China;

    Southeast Univ, Sch Comp Sci & Engn, Nanjing 210096, Jiangsu, Peoples R China|Southeast Univ, Minist Educ, Key Lab Comp Network & Informat Integrat, Nanjing 210096, Jiangsu, Peoples R China|Collaborat Innovat Ctr Wireless Commun Technol, Nanjing 210096, Jiangsu, Peoples R China;

    Southeast Univ, Sch Comp Sci & Engn, Nanjing 210096, Jiangsu, Peoples R China|Southeast Univ, Minist Educ, Key Lab Comp Network & Informat Integrat, Nanjing 210096, Jiangsu, Peoples R China|Collaborat Innovat Ctr Wireless Commun Technol, Nanjing 210096, Jiangsu, Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    machine learning; data mining; class imbalance; over sampling; boosting; transfer learning;

    机译:机器学习;数据挖掘;类不平衡;过采样;增强;转移学习;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号