首页> 外文期刊>International journal of soft computing >Similarity-Based Techniques for Text Document Classification
【24h】

Similarity-Based Techniques for Text Document Classification

机译:基于相似度的文本文档分类技术

获取原文
           

摘要

With large scale text classification labeling a large number of documents for training poses a considerable burden on human experts who need to read each document and assign it to appropriate categories. With this problem in mind, our goal was to develop a text categorization system that uses fewer labeled examples for training to achieve a given level of performance using a similarity-based learning algorithm and thresholding strategies. Experimental results show that the proposed model is quite useful to build document categorization systems. This has been designed for a small level implementation considering the size of the corpus being used. This can be enhanced for a larger data set and the efficiency can be proved against the performance of the presently available methods like SVM, naive bayes etc. This approach on the whole concentrates on categorizing small level documents and does the assigned task with completeness.
机译:使用大规模文本分类来标记大量要培训的文档,这给需要阅读每个文档并将其分配给适当类别的人类专家带来了相当大的负担。考虑到这个问题,我们的目标是开发一种文本分类系统,该系统使用较少的带有标签的示例进行训练,以使用基于相似性的学习算法和阈值策略来达到给定的性能水平。实验结果表明,该模型对建立文档分类系统非常有用。考虑到所使用语料库的大小,这是为小规模实施而设计的。对于较大的数据集,可以增强此功能,并且可以针对目前可用的方法(如SVM,朴素贝叶斯等)的性能证明其效率。总体上,这种方法着重于对小型文档进行分类,并完全完成分配的任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号