【24h】

Breaking the Closed World Assumption in Text Classification

机译:打破文本分类的封闭世界假设

获取原文

摘要

Existing research on multiclass text classification mostly makes the closed world assumption, which focuses on designing accurate classifiers under the assumption that all test classes are known at training time. A more realistic scenario is to expect unseen classes during testing (open world). In this case, the goal is to design a learning system that classifies documents of the known classes into their respective classes and also to reject documents from unknown classes. This problem is called open (world) classification. This paper approaches the problem by reducing the open space risk while balancing the empirical risk. It proposes to use a new learning strategy, called center-based similarity (CBS) space learning (or CBS learning), to provide a novel solution to the problem. Extensive experiments across two datasets show that CBS learning gives promising results on multiclass open text classification compared to state-of-the-art baselines.
机译:现有的关于多类文本分类的研究大多是封闭世界的假设,该假设着重于在训练时已知所有测试类的假设下设计准确的分类器。一个更现实的情况是期望在测试(开放世界)期间看不见的课程。在这种情况下,目标是设计一种学习系统,该系统将已知类别的文档分类为各自的类别,并拒绝来自未知类别的文档。此问题称为开放(世界)分类。本文通过在平衡经验风险的同时减少空地风险来解决这一问题。它建议使用一种新的学习策略,称为基于中心的相似性(CBS)空间学习(或CBS学习),以提供一种解决该问题的新颖方法。横跨两个数据集的大量实验表明,与最新的基准相比,CBS学习在多类开放文本分类上提供了令人鼓舞的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号