首页> 中文期刊> 《计算机应用与软件》 >基于集成的非均衡数据分类主动学习算法

基于集成的非均衡数据分类主动学习算法

         

摘要

当前,处理类别非均衡数据采用的主要方法之一就是预处理,将数据均衡化之后采取传统的方法加以训练.预处理的方法主要有过取样和欠取样,然而过取样和欠取样都有自己的不足,提出拆分提升主动学习算法SBAL( Split-Boost Active Learning),该算法将大类样本集根据非均衡比例分成多个子集,子集与小类样本集合并,对其采用AdaBoost算法训练子分类器,然后集成一个总分类器,并基于QBC( Query-by-committee)主动学习算法主动选取有效样本进行训练,基本避免了由于增加样本或者减少样本所带来的不足.实验表明,提出的算法对于非均衡数据具有更高的分类精度.%At present, one of the popular methods to process imbalance dataset classification is resampling, to balance the number of training examples among classes and take the traditional method to train the balanced dataset. The main ways of resampling include over-sampling and under-sampling. However there are shortages in both over-sampling and under-sampling. This paper proposes a split-boo9t active learning algorithm called SBAL. The proposed algorithm splits the majority class dataset into subsets according to the proportion of imbalance samples, combines with minority class dataset, and trains the classifiers by AdaBoost algorithm, then boosts a total classifier. SBAL algorithm selects the effective training samples to join the last training based on QBC Active Learning algorithm, so it avoids the shortages of the over-sampling and under-sampling fundamentally. Experiments show that the proposed algorithm gains higher classification accuracy with imbalance datasets.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号