首页> 外国专利> TEXT CATEGORIZATION WITH KNOWLEDGE TRANSFER FROM HETEROGENEOUS DATASETS

TEXT CATEGORIZATION WITH KNOWLEDGE TRANSFER FROM HETEROGENEOUS DATASETS

机译:来自异类数据集的知识转移文本分类

摘要

The present invention provides a method for incorporating features from heterogeneous auxiliary datasets into input text data for use in classification, a plurality of heterogeneous auxiliary datasets, such as labeled datasets and unlabeled datasets, are accessed after receiving input text data. A plurality of features are extracted from each of the plurality of heterogeneous auxiliary datasets. The plurality of features are combined with the input text data to generate a set of features which may potentially be used to classify the input text data. Classification features are then extracted from the set of features and used to classify the input text data. In one embodiment, the classification features are extracted by calculating a mutual information value associated with each feature in the set of features and identifying features having a mutual information value exceeding a threshold value.
机译:本发明提供了一种用于将来自异构辅助数据集的特征合并到用于分类的输入文本数据中的方法,在接收输入文本数据之后访问多个异构辅助数据集,例如标记数据集和未标记数据集。从多个异构辅助数据集中的每个提取多个特征。多个特征与输入文本数据组合以生成可以潜在地用于分类输入文本数据的一组特征。然后从一组特征中提取分类特征,并将其用于对输入文本数据进行分类。在一个实施例中,通过计算与一组特征中的每个特征相关联的互信息值并识别具有互信息值超过阈值的特征来提取分类特征。

著录项

  • 公开/公告号US2009171956A1

    专利类型

  • 公开/公告日2009-07-02

    原文格式PDF

  • 申请/专利权人 RAKESH GUPTA;LEV RATINOV;

    申请/专利号US20080249809

  • 发明设计人 RAKESH GUPTA;LEV RATINOV;

    申请日2008-10-10

  • 分类号G06F17/30;

  • 国家 US

  • 入库时间 2022-08-21 19:34:15

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号