Transfer synthetic over-sampling for class-imbalance learning with limited minority class data

Liu Xu-Ying; Wang Sheng-Tao; Zhang Min-Ling

首页> 外文期刊>Frontiers of computer science in China >Transfer synthetic over-sampling for class-imbalance learning with limited minority class data

【24h】

Transfer synthetic over-sampling for class-imbalance learning with limited minority class data

机译：传输合成过采样，以较少的少数班级数据进行班级不平衡学习

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The problem of limited minority class data is encountered in many class imbalanced applications, but has received little attention. Synthetic over-sampling, as popular class-imbalance learning methods, could introduce much noise when minority class has limited data since the synthetic samples are not i.i.d. samples of minority class. Most sophisticated synthetic sampling methods tackle this problem by denoising or generating samples more consistent with ground-truth data distribution. But their assumptions about true noise or ground-truth data distribution may not hold. To adapt synthetic sampling to the problem of limited minority class data, the proposed Traso framework treats synthetic minority class samples as an additional data source, and exploits transfer learning to transfer knowledge from them to minority class. As an implementation, TrasoBoost method firstly generates synthetic samples to balance class sizes. Then in each boosting iteration, the weights of synthetic samples and original data decrease and increase respectively when being misclassified, and remain unchanged otherwise. The misclassified synthetic samples are potential noise, and thus have smaller influence in the following iterations. Besides, the weights of minority class instances have greater change than those of majority class instances to be more influential. And only original data are used to estimate error rate to be immune from noise. Finally, since the synthetic samples are highly related to minority class, all of the weak learners are aggregated for prediction. Experimental results show TrasoBoost outperforms many popular class-imbalance learning methods.

机译：在许多班级不平衡的应用程序中遇到少数班级数据有限的问题，但很少引起注意。作为流行的班级不平衡学习方法，合成过采样可能会在少数班级数据有限时引入很多噪声，因为合成样本不是i.i.d.少数族裔样本。最先进的合成采样方法通过对样本进行降噪或生成更符合地面真实数据分布的样本来解决此问题。但是他们关于真实噪声或真实数据分布的假设可能不成立。为了使合成采样适应少数族裔类别数据的局限性，提出的Traso框架将合成少数族裔样本作为附加数据源，并利用转移学习将知识从它们转移到少数族裔类别。作为一种实现，TrasoBoost方法首先生成合成样本以平衡类大小。然后，在每个增强迭代中，合成样本和原始数据的权重在分类错误时分别减小和增加，否则保持不变。错误分类的合成样本是潜在噪声，因此在后续迭代中影响较小。此外，少数群体实例的权重变化比多数群体实例的权重更大，从而更具影响力。并且仅原始数据用于估计错误率以免受噪声影响。最后，由于合成样本与少数族裔高度相关，因此将所有弱学习者聚集在一起进行预测。实验结果表明TrasoBoost优于许多流行的类不平衡学习方法。

著录项

来源
《Frontiers of computer science in China》 |2019年第5期|996-1009|共14页
作者
Liu Xu-Ying; Wang Sheng-Tao; Zhang Min-Ling;
展开▼
作者单位

Southeast Univ, Sch Comp Sci & Engn, Nanjing 210096, Jiangsu, Peoples R China|Southeast Univ, Minist Educ, Key Lab Comp Network & Informat Integrat, Nanjing 210096, Jiangsu, Peoples R China|Collaborat Innovat Ctr Wireless Commun Technol, Nanjing 210096, Jiangsu, Peoples R China;

Southeast Univ, Sch Comp Sci & Engn, Nanjing 210096, Jiangsu, Peoples R China|Southeast Univ, Minist Educ, Key Lab Comp Network & Informat Integrat, Nanjing 210096, Jiangsu, Peoples R China|Collaborat Innovat Ctr Wireless Commun Technol, Nanjing 210096, Jiangsu, Peoples R China;

Southeast Univ, Sch Comp Sci & Engn, Nanjing 210096, Jiangsu, Peoples R China|Southeast Univ, Minist Educ, Key Lab Comp Network & Informat Integrat, Nanjing 210096, Jiangsu, Peoples R China|Collaborat Innovat Ctr Wireless Commun Technol, Nanjing 210096, Jiangsu, Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
machine learning; data mining; class imbalance; over sampling; boosting; transfer learning;

机译：机器学习;数据挖掘;类不平衡;过采样;增强;转移学习;

相似文献

外文文献
中文文献
专利

1. Transfer synthetic over-sampling for class-imbalance learning with limited minority class data [J] . Liu Xu-Ying, Wang Sheng-Tao, Zhang Min-Ling Frontiers of computer science . 2019,第5期

机译：使用有限的少数级别数据转移类别不平衡学习的综合性过度抽样
2. A synthetic informative minority over-sampling (SIMO) algorithm leveraging support vector machine to enhance learning from imbalanced datasets [J] . Piri Saeed, Delen Dursun, Liu Tieming Decision support systems . 2018,第FEBa期

机译：利用支持向量机的综合信息性少数过度采样（SIMO）算法，可增强从不平衡数据集中的学习
3. Dynamic Synthetic Minority Over-Sampling Technique-Based Rotation Forest for the Classification of Imbalanced Hyperspectral Data [J] . Wei Feng, Gabriel Dauphin, Wenjiang Huang, Selected Topics in Applied Earth Observations and Remote Sensing, IEEE Journal of . 2019,第7期

机译：基于动态综合少数民族过采样技术的旋转森林用于高光谱数据不平衡分类
4. The Study of Synthetic Minority Over-sampling Technique (SMOTE) and Weighted Extreme Learning Machine for Handling Imbalance Problem on Multiclass Microarray classification [C] . Khadijah, Sukmawati Nur Endah, Retno Kusumaningrum, International Conference on Informatics and Computational Sciences . 2018

机译：综合少数族群过采样技术（SMOTE）和加权极限学习机处理多类微阵列分类失衡问题的研究
5. Learning transfer rules for machine translation with limited data. [D] . Probst, Katharina. 2005

机译：学习数据有限的机器翻译的传输规则。
6. An efficient algorithm coupled with synthetic minority over-sampling technique to classify imbalanced PubChem BioAssay data [O] . Ming Hao, Yanli Wang, Stephen H. Bryant -1

机译：一种有效的算法结合合成少数过采样技术对不平衡的PubChem BioAssay数据进行分类
7. Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem [O] . Chumphol Bunkhumpornpat, Krung Sinapiromsaran, Chidchanok Lursinsap 2013

机译：safe-Level-smOTE：用于处理类不平衡问题的安全级综合少数过采样技术

Transfer synthetic over-sampling for class-imbalance learning with limited minority class data

摘要

著录项

相似文献

相关主题

期刊订阅