Transfer synthetic over-sampling for class-imbalance learning with limited minority class data

Liu Xu-Ying; Wang Sheng-Tao; Zhang Min-Ling

首页> 外文期刊>Frontiers of computer science >Transfer synthetic over-sampling for class-imbalance learning with limited minority class data

【24h】

Transfer synthetic over-sampling for class-imbalance learning with limited minority class data

机译：使用有限的少数级别数据转移类别不平衡学习的综合性过度抽样

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The problem of limited minority class data is encountered in many class imbalanced applications, but has received little attention. Synthetic over-sampling, as popular class-imbalance learning methods, could introduce much noise when minority class has limited data since the synthetic samples are not i.i.d. samples of minority class. Most sophisticated synthetic sampling methods tackle this problem by denoising or generating samples more consistent with ground-truth data distribution. But their assumptions about true noise or ground-truth data distribution may not hold. To adapt synthetic sampling to the problem of limited minority class data, the proposed Traso framework treats synthetic minority class samples as an additional data source, and exploits transfer learning to transfer knowledge from them to minority class. As an implementation, TrasoBoost method firstly generates synthetic samples to balance class sizes. Then in each boosting iteration, the weights of synthetic samples and original data decrease and increase respectively when being misclassified, and remain unchanged otherwise. The misclassified synthetic samples are potential noise, and thus have smaller influence in the following iterations. Besides, the weights of minority class instances have greater change than those of majority class instances to be more influential. And only original data are used to estimate error rate to be immune from noise. Finally, since the synthetic samples are highly related to minority class, all of the weak learners are aggregated for prediction. Experimental results show TrasoBoost outperforms many popular class-imbalance learning methods.

机译：在许多类的不平衡应用程序中遇到了有限的少数级别数据的问题，但收到了很少的关注。合成过采样，作为流行的类别不平衡学习方法，当少数群体类数据有限时，可以引入很多噪音，因为合成样本不是i.i.d。少数民族类别的样本。大多数复杂的合成采样方法通过去噪或产生与地面真实数据分布一致的样品来解决这个问题。但他们对真实噪声或地面真实数据分布的假设可能不会持有。为了使合成采样适应有限的少数群体数据数据的问题，所提出的Traso框架将合成少数群体类样本视为额外的数据源，并利用转移学习将知识转移到少数阶级。作为实现，Trasoboost方法首先生成合成样本以平衡类尺寸。然后在每个升压迭代中，在错误分类时，合成样本的权重和原始数据的重量分别减小和增加，否则保持不变。错误分类的合成样品是潜在的噪声，因此在以下迭代中具有较小的影响。此外，少数群体实例的权重比多数阶级实例更具影响力的变化更大。并且只使用原始数据来估计误差率免受噪声。最后，由于合成样本与少数阶级高度相关，因此所有弱学习者都会被汇总以供预测。实验结果表明，Trasoboost优于许多流行的类别 - 不平衡学习方法。

著录项

来源
《Frontiers of computer science》 |2019年第5期|996-1009|共14页
作者
Liu Xu-Ying; Wang Sheng-Tao; Zhang Min-Ling;
展开▼
作者单位

Southeast Univ Sch Comp Sci & Engn Nanjing 210096 Jiangsu Peoples R China|Southeast Univ Minist Educ Key Lab Comp Network & Informat Integrat Nanjing 210096 Jiangsu Peoples R China|Collaborat Innovat Ctr Wireless Commun Technol Nanjing 210096 Jiangsu Peoples R China;

Southeast Univ Sch Comp Sci & Engn Nanjing 210096 Jiangsu Peoples R China|Southeast Univ Minist Educ Key Lab Comp Network & Informat Integrat Nanjing 210096 Jiangsu Peoples R China|Collaborat Innovat Ctr Wireless Commun Technol Nanjing 210096 Jiangsu Peoples R China;

Southeast Univ Sch Comp Sci & Engn Nanjing 210096 Jiangsu Peoples R China|Southeast Univ Minist Educ Key Lab Comp Network & Informat Integrat Nanjing 210096 Jiangsu Peoples R China|Collaborat Innovat Ctr Wireless Commun Technol Nanjing 210096 Jiangsu Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
machine learning; data mining; class imbalance; over sampling; boosting; transfer learning;

机译：机器学习;数据挖掘;类别不平衡;通过抽样;提升;转移学习;

相似文献

外文文献
中文文献
专利

1. Transfer synthetic over-sampling for class-imbalance learning with limited minority class data [J] . Liu Xu-Ying, Wang Sheng-Tao, Zhang Min-Ling Frontiers of computer science in China . 2019,第5期

机译：传输合成过采样，以较少的少数班级数据进行班级不平衡学习
2. A synthetic informative minority over-sampling (SIMO) algorithm leveraging support vector machine to enhance learning from imbalanced datasets [J] . Piri Saeed, Delen Dursun, Liu Tieming Decision support systems . 2018,第FEBa期

机译：利用支持向量机的综合信息性少数过度采样（SIMO）算法，可增强从不平衡数据集中的学习
3. Dynamic Synthetic Minority Over-Sampling Technique-Based Rotation Forest for the Classification of Imbalanced Hyperspectral Data [J] . Wei Feng, Gabriel Dauphin, Wenjiang Huang, Selected Topics in Applied Earth Observations and Remote Sensing, IEEE Journal of . 2019,第7期

机译：基于动态综合少数民族过采样技术的旋转森林用于高光谱数据不平衡分类
4. The Study of Synthetic Minority Over-sampling Technique (SMOTE) and Weighted Extreme Learning Machine for Handling Imbalance Problem on Multiclass Microarray classification [C] . Khadijah, Sukmawati Nur Endah, Retno Kusumaningrum, International Conference on Informatics and Computational Sciences . 2018

机译：综合少数族群过采样技术（SMOTE）和加权极限学习机处理多类微阵列分类失衡问题的研究
5. Learning transfer rules for machine translation with limited data. [D] . Probst, Katharina. 2005

机译：学习数据有限的机器翻译的传输规则。
6. An efficient algorithm coupled with synthetic minority over-sampling technique to classify imbalanced PubChem BioAssay data [O] . Ming Hao, Yanli Wang, Stephen H. Bryant -1

机译：一种有效的算法结合合成少数过采样技术对不平衡的PubChem BioAssay数据进行分类
7. Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem [O] . Chumphol Bunkhumpornpat, Krung Sinapiromsaran, Chidchanok Lursinsap 2013

机译：safe-Level-smOTE：用于处理类不平衡问题的安全级综合少数过采样技术

Transfer synthetic over-sampling for class-imbalance learning with limited minority class data

摘要

著录项

相似文献

相关主题

期刊订阅