Performance Comparison and Optimization of Text Document Classification using k-NN and Na?ve Bayes Classification Techniques

Zulfany Erlisa Rasjid; Reina Setiawan

首页> 外文期刊>Procedia Computer Science >Performance Comparison and Optimization of Text Document Classification using k-NN and Na?ve Bayes Classification Techniques

【24h】

Performance Comparison and Optimization of Text Document Classification using k-NN and Na?ve Bayes Classification Techniques

机译：基于k-NN和朴素贝叶斯分类技术的文本文档分类性能比较和优化

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the current era, information is available in several different formats, such as text, image, video, audio and others. Corpus is a collection of documents in a large volume. By using Information Retrieval (IR), it is possible to obtain an unstructured information and automatic summary, classification and clustering. This research is to focus on data classification using two out of the six approaches of data classification, which is k-NN (k-Nearest Neighbors) and Na?ve Bayes. The text documents used is in XML format. The Corpus used in this research is downloaded from TREC Legal Track with a total of more than three thousand text documents and over twenty types of classifications. Out of the twenty types of classifications, six are chosen with the most number of text documents. The data is processed using RapidMiner software and the result shows that the optimum value for k in k-NN occurs at k=13. Using this value for k, the accruacy in average reached 55.17 percent, which is better than using Na?ve Bayes which is 39.01 percent.

机译：在当前时代，信息以几种不同的格式提供，例如文本，图像，视频，音频等。语料库是大量文档的集合。通过使用信息检索（IR），可以获得非结构化信息以及自动汇总，分类和聚类。这项研究的重点是使用六种数据分类方法中的两种方法进行数据分类，即k-NN（k最近邻）和朴素贝叶斯。所使用的文本文档为XML格式。本研究中使用的语料库是从TREC Legal Track下载的，共有三千多个文本文档和二十多种分类。在二十种分类中，有六种选择的文本文档数量最多。使用RapidMiner软件处理数据，结果表明k-NN中k的最佳值出现在k = 13处。使用此k值，平均准确率达到55.17％，比使用朴素贝叶斯（39.01％）更好。

著录项

来源
《Procedia Computer Science》 |2017年第22期|共6页
作者
Zulfany Erlisa Rasjid; Reina Setiawan;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Protein classification based on text document classification techniques. [J] . Cheng BY, Carbonell JG, Klein Seetharaman J Proteins: Structure, Function, and Genetics . 2005,第4期

机译：基于文本文档分类技术的蛋白质分类。
2. Document Classification of Assamese Text Using Na?ve Bayes Approach [J] . Moromi Gogoi, Shikhar Kumar Sarma International Journal of Computer Trends and Technology . 2015,第4期

机译：使用朴素贝叶斯方法对阿萨姆语文本进行文档分类
3. Varying Naieve Bayes Models With Applications to Classification of Chinese Text Documents [J] . Guoyu Guan, Jianhua Guo, Hansheng Wang Journal of business & economic statistics . 2014,第3期

机译：多种朴素贝叶斯模型在中文文本文档分类中的应用
4. Techniques for Improving the Performance of Naive Bayes for Text Classification [C] . Karl-Michael Schneider International Conference on Computational Linguistics and Intelligent Text Processing(CICLing 2005); 20050213-19; Mexico City(MX) . 2005

机译：提高朴素贝叶斯文本分类性能的技术
5. A comparison of pixel-based and object-oriented image classification techniques for forest cover type determination in east Texas [D] . Raines, Jason 2008

机译：基于像素和面向对象的图像分类技术在东德克萨斯州森林覆盖类型确定中的比较
6. A Novel Feature Selection Technique for Text Classification Using Naïve Bayes [O] . Subhajit Dey Sarkar, Saptarsi Goswami, Aman Agarwal, 2014

机译：基于朴素贝叶斯的文本分类新特征选择技术
7. Techniques for Improving the Performance of Naive Bayes for Text Classification [O] . Karl-Michael Schneider 2005

机译：提高朴素贝叶斯文本分类性能的技术
8. Security Classification Using Automated Learning (SCALE): Optimizing Statistical Natural Language Processing Techniques to Assign Security Labels to Unstructured Text [R] . Brown, J. D., Charlebois, D. 2010

机译：使用自动学习的安全性分类（sCaLE）：优化统计自然语言处理技术，将安全标签分配给非结构化文本

Performance Comparison and Optimization of Text Document Classification using k-NN and Na?ve Bayes Classification Techniques

摘要

著录项

相似文献

相关主题

期刊订阅