首页> 外文期刊>International journal of computer science and network security >Feature Extraction based Text Classification using K-Nearest Neighbor Algorithm
【24h】

Feature Extraction based Text Classification using K-Nearest Neighbor Algorithm

机译:使用K最近邻算法的基于特征提取的文本分类

获取原文
           

摘要

Scientific publications has been increasing enormously, with this increase classification of scientific publications is becoming challenging task. The core objective of this research is to analyze the performance of classification algorithms using Scopus dataset. In text classification, classification and feature extraction from the document using extracted features are the major issues for decreasing the performances in different algorithms. In this paper, performances of classification algorithms such as Na?ve Bayes (NB) and K-Nearest Neighbor (K-NN) shown better improvement using Bayesian boost and bagging. The performance results were analyzed through selected classification algorithms over 10K documents from Scopus examined using F-measure and produced comparison matrices to estimate accuracy, precision and recall using NB and KNN classifier. Further, data preprocessing and cleaning steps are induced on the selected dataset and class imbalance issues are analyzed to increase the performance of text classification algorithms. Experimental results showed performances over 7% using K-NN and revealed better as compared to NB.
机译:科学出版物的数量已大大增加,随着科学出版物分类的增加,这项任务变得越来越具有挑战性。这项研究的核心目标是使用Scopus数据集分析分类算法的性能。在文本分类中,使用提取的特征从文档中进行分类和特征提取是降低不同算法性能的主要问题。在本文中,分类算法(例如朴素贝叶斯(NB)和K最近邻(K-NN))的性能显示出使用贝叶斯增强和装袋的更好的改进。通过选择的分类算法对性能结果进行分析,这些算法使用F-measure对Scopus的10K文档进行了检验,并使用NB和KNN分类器生成了比较矩阵,以估计准确性,准确性和召回率。此外,对所选数据集进行数据预处理和清理步骤,并分析类不平衡问题,以提高文本分类算法的性能。实验结果表明,使用K-NN的性能超过7%,并且与NB相比显示更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号