【24h】

Automatic Document Categorization Based on k-NN and Object-Based Thesauri

机译:基于k-NN和基于对象叙词表的自动文档分类

获取原文
获取原文并翻译 | 示例

摘要

The k-NN classifier(k-NN) is one of the most popular document categorization methods because of its simplicity and relatively good performance. However, it significantly degrades precision when ambiguity arises - there exist more than one candidate category for a document to be assigned. To remedy the drawback, we propose a new method, which incorporates the relationships of object-based thesauri into the document categorization using k-NN. Employing the thesaurus entails structuring categories into taxonomies, since their structure needs to be conformed to that of the thesaurus for capturing relationships between themselves. By referencing relationships in the thesaurus which correspond to the structured categories, k-NN can be drastically improved, removing the ambiguity. In this paper, we first perform the document categorization by using k-NN and then employ the relationships to reduce the ambiguity. Experimental results show that the proposed approach improves the precision of k-NN up to 13.86% without compromising its recall.
机译:k-NN分类器(k-NN)由于其简单性和相对较好的性能而成为最流行的文档分类方法之一。但是,当出现歧义时,它将大大降低精度-要分配的文档存在多个候选类别。为了弥补这一缺陷,我们提出了一种新方法,该方法将基于对象叙词表的关系纳入使用k-NN的文档分类中。使用叙词表需要将分类结构化为分类法,因为它们的结构需要与叙词表的结构相一致才能捕获它们之间的关系。通过引用同义词库中与结构化类别相对应的关系,可以显着改善k-NN,从而消除歧义。在本文中,我们首先使用k-NN进行文档分类,然后利用这些关系减少歧义。实验结果表明,该方法在不影响召回率的前提下,将k-NN的精度提高了13.86%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号