首页> 外文会议>International Symposium on Knowledge and Systems Sciences(KSS'2006); 20060922-25; Beijing(CN) >THE IMPROVEMENT OF NAIVE BAYESIAN CLASSIFIER BASED ON THE STRATEGY OF FEATURE SELECTION AND SAMPLE CLEANING
【24h】

THE IMPROVEMENT OF NAIVE BAYESIAN CLASSIFIER BASED ON THE STRATEGY OF FEATURE SELECTION AND SAMPLE CLEANING

机译:基于特征选择和样本清洁策略的朴素贝叶斯分类器的改进

获取原文
获取原文并翻译 | 示例

摘要

Naive Bayesian Classifier (NBC) is a simple and effective classification model. Though it shows a lot of edges over many other Classifiers, it does not always yield satisfactory result. In this paper, we give a summary of the previous improvement methods for the NBC model. Then three improvement strategies are proposed: the feature selection strategy, the sample cleaning strategy and the mixed strategy. By choosing the optimized feature subset according to the feature important factor (FIF) of every feature, the first method simplifies the dimensionality of dataset; the second method deletes noisy samples within the training dataset according to the sample polluting factor; the third method integrates the two methods: first the feature selection, and then the sample cleaning. Through the experimental comparison and analysis on the UCI repository, these strategies are proved effective. Averagely speaking, with 36.76% of the features in the original feature set, we can raise the prediction accuracy by 2.30% using the first method. While with 92.57% samples in the training dataset, we can raise the prediction accuracy by 1.59% using the second method. As to the third method, the prediction accuracy can be increased 2.55%. Among these strategies, the mixed one shows the advantages over the other two, which reduces the complexity of the model while increasing the prediction accuracy of the NBC model.
机译:朴素贝叶斯分类器(NBC)是一种简单有效的分类模型。尽管与许多其他分类器相比,它显示出很多优势,但并非总是能产生令人满意的结果。在本文中,我们总结了NBC模型以前的改进方法。然后提出了三种改进策略:特征选择策略,样本清洗策略和混合策略。通过根据每个特征的特征重要因子(FIF)选择优化的特征子集,第一种方法简化了数据集的维数。第二种方法是根据样本污染因子删除训练数据集中的噪声样本。第三种方法整合了两种方法:首先进行特征选择,然后进行样本清洗。通过对UCI存储库的实验比较和分析,这些策略被证明是有效的。通常,使用原始特征集中的特征的36.76%,我们可以使用第一种方法将预测准确性提高2.30%。虽然训练数据集中有92.57%的样本,但使用第二种方法可以将预测准确性提高1.59%。对于第三种方法,可以将预测精度提高2.55%。在这些策略中,混合策略显示出优于其他两种策略的优势,这降低了模型的复杂性,同时提高了NBC模型的预测准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号