Feature Extraction based Text Classification using K-Nearest Neighbor Algorithm

Muhammad Azam; Tanvir Ahmed; Fahad Sabah; Muhammad Iftikhar Hussain

首页> 外文期刊>International journal of computer science and network security >Feature Extraction based Text Classification using K-Nearest Neighbor Algorithm

【24h】

Feature Extraction based Text Classification using K-Nearest Neighbor Algorithm

机译：使用K最近邻算法的基于特征提取的文本分类

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Scientific publications has been increasing enormously, with this increase classification of scientific publications is becoming challenging task. The core objective of this research is to analyze the performance of classification algorithms using Scopus dataset. In text classification, classification and feature extraction from the document using extracted features are the major issues for decreasing the performances in different algorithms. In this paper, performances of classification algorithms such as Na?ve Bayes (NB) and K-Nearest Neighbor (K-NN) shown better improvement using Bayesian boost and bagging. The performance results were analyzed through selected classification algorithms over 10K documents from Scopus examined using F-measure and produced comparison matrices to estimate accuracy, precision and recall using NB and KNN classifier. Further, data preprocessing and cleaning steps are induced on the selected dataset and class imbalance issues are analyzed to increase the performance of text classification algorithms. Experimental results showed performances over 7% using K-NN and revealed better as compared to NB.

机译：科学出版物的数量已大大增加，随着科学出版物分类的增加，这项任务变得越来越具有挑战性。这项研究的核心目标是使用Scopus数据集分析分类算法的性能。在文本分类中，使用提取的特征从文档中进行分类和特征提取是降低不同算法性能的主要问题。在本文中，分类算法（例如朴素贝叶斯（NB）和K最近邻（K-NN））的性能显示出使用贝叶斯增强和装袋的更好的改进。通过选择的分类算法对性能结果进行分析，这些算法使用F-measure对Scopus的10K文档进行了检验，并使用NB和KNN分类器生成了比较矩阵，以估计准确性，准确性和召回率。此外，对所选数据集进行数据预处理和清理步骤，并分析类不平衡问题，以提高文本分类算法的性能。实验结果表明，使用K-NN的性能超过7％，并且与NB相比显示更好。

著录项

来源
《International journal of computer science and network security》 |2018年第12期|共7页
作者
Muhammad Azam; Tanvir Ahmed; Fahad Sabah; Muhammad Iftikhar Hussain;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术;
关键词
K-NNna?ve bayestext classificationrapid minerfeature extraction;

机译：K-NNna？ve贝叶斯文本分类快速矿物特征提取;

相似文献

外文文献
中文文献
专利

1. A feature weighted penalty based dissimilarity measure for k-nearest neighbor classification with missing features [J] . Datta Shounak, Misra Debaleena, Das Swagatam Pattern recognition letters . 2016,第sepa1期

机译：基于特征加权罚分的不相似度量用于缺失特征的k最近邻分类
2. Text categorization based on k-nearest neighbor approach for Web site classification [J] . Oh-Woog Kwon, Jong-Hyeok Lee Information Processing & Management . 2003,第1期

机译：基于k近邻法的文本分类用于网站分类。
3. Automatic classification of insulator by combining k-nearest neighbor algorithm with multi-type feature for the Internet of Things [J] . Hu Guoxiong, Yang Zhong, Zhu Maohu, Eurasip Journal on Wireless Communications and Networking . 2018,第期

机译：通过将k最近邻算法与多型特征结合克 - 最近邻算法来自动分类绝缘子
4. Text Classification Using K-Nearest Neighbor Algorithm and Firefly Algorithm for Text Feature Selection [C] . R. Janani, S. Vijayarani International Conference in Advances in Electrical and Computer Technologies . 2019

机译：文本分类使用k最近邻算法和萤火虫算法进行文本特征选择
5. Voting Nearest Neighbors: SVM Constraints Selection Algorithm Based on K-Nearest Neighbors [D] . Moreira da Costa, Leandro. 2019

机译：投票最近的邻居：基于K-Indect邻居的SVM约束选择算法
6. A Novel Hybrid Classification Model of Genetic Algorithms Modified k-Nearest Neighbor and Developed Backpropagation Neural Network [O] . Nader Salari, Shamarina Shohaimi, Farid Najafi, -1

机译：遗传算法改进的k最近邻和发达的反向传播神经网络的混合分类模型
7. Butterfly identification using gray level co-occurrence matrix (glcm) extraction feature and k-nearest neighbor (knn) classification [O] . Rico Andrian, Devi Maharani, Meizano Ardhi Muhammad, 2019

机译：使用灰度共发生矩阵（GLCM）提取特征和k最近邻（KNN）分类

Feature Extraction based Text Classification using K-Nearest Neighbor Algorithm

摘要

著录项

相似文献

相关主题

期刊订阅