Similarity-Based Techniques for Text Document Classification

S. Senthamarai Kannan; N. Ramaraj

首页> 外文期刊>International journal of soft computing >Similarity-Based Techniques for Text Document Classification

【24h】

Similarity-Based Techniques for Text Document Classification

机译：基于相似度的文本文档分类技术

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

With large scale text classification labeling a large number of documents for training poses a considerable burden on human experts who need to read each document and assign it to appropriate categories. With this problem in mind, our goal was to develop a text categorization system that uses fewer labeled examples for training to achieve a given level of performance using a similarity-based learning algorithm and thresholding strategies. Experimental results show that the proposed model is quite useful to build document categorization systems. This has been designed for a small level implementation considering the size of the corpus being used. This can be enhanced for a larger data set and the efficiency can be proved against the performance of the presently available methods like SVM, naive bayes etc. This approach on the whole concentrates on categorizing small level documents and does the assigned task with completeness.

机译：使用大规模文本分类来标记大量要培训的文档，这给需要阅读每个文档并将其分配给适当类别的人类专家带来了相当大的负担。考虑到这个问题，我们的目标是开发一种文本分类系统，该系统使用较少的带有标签的示例进行训练，以使用基于相似性的学习算法和阈值策略来达到给定的性能水平。实验结果表明，该模型对建立文档分类系统非常有用。考虑到所使用语料库的大小，这是为小规模实施而设计的。对于较大的数据集，可以增强此功能，并且可以针对目前可用的方法（如SVM，朴素贝叶斯等）的性能证明其效率。总体上，这种方法着重于对小型文档进行分类，并完全完成分配的任务。

著录项

来源
《International journal of soft computing》 |2008年第1期|共5页
作者
S. Senthamarai Kannan; N. Ramaraj;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Performance Comparison and Optimization of Text Document Classification using k-NN and Na?ve Bayes Classification Techniques [J] . Zulfany Erlisa Rasjid, Reina Setiawan Procedia Computer Science . 2017,第22期

机译：基于k-NN和朴素贝叶斯分类技术的文本文档分类性能比较和优化
2. Protein classification based on text document classification techniques. [J] . Cheng BY, Carbonell JG, Klein Seetharaman J Proteins: Structure, Function, and Genetics . 2005,第4期

机译：基于文本文档分类技术的蛋白质分类。
3. Text Classification From Unlabeled Documents With Bootstrapping And Feature Projection Techniques [J] . Youngjoong Ko, Jungyun Seo Information Processing & Management . 2009,第1期

机译：使用自举和特征投影技术对未标记文档进行文本分类
4. A Study on the Impact of Pre-Processing Techniques in Spanish and English Text Classification over Short and Large Text Documents [C] . Gerardo Orellana, Belen Arias, Marcos Orellana, International Conference on Information Systems and Computer Science . 2018

机译：西班牙语和英语文本分类中的预处理技术对短文本和大文本文件的影响研究
5. Document image analysis techniques for handwritten text segmentation, document image rectification and digital collation. [D] . Salvi, Dhaval. 2014

机译：用于手写文本分割，文档图像校正和数字整理的文档图像分析技术。
6. CarSite-II: an integrated classification algorithm for identifying carbonylated sites based on K-means similarity-based undersampling and synthetic minority oversampling techniques [O] . Yun Zuo, Jianyuan Lin, Xiangxiang Zeng, 2021

机译：Carsite-II：一种基于K-Means相似性的欠采样和合成少数群体过采样技术鉴定羰基化位点的综合分类算法
7. A Rule-Based Approach to Embedding Techniques for Text Document Classification [O] . Asmaa M. Aubaid, Alok Mishra 2020

机译：基于规则的文本文档分类技术的嵌入技术方法

Similarity-Based Techniques for Text Document Classification

摘要

著录项

相似文献

相关主题

期刊订阅