首页> 外国专利> Text categorization with knowledge transfer from heterogeneous datasets

Text categorization with knowledge transfer from heterogeneous datasets

机译:从异类数据集中进行知识转移的文本分类

摘要

The present invention provides a method for incorporating features from heterogeneous auxiliary datasets into input text data for use in classification. Heterogeneous auxiliary datasets, such as labeled datasets and unlabeled datasets, are accessed after receiving input text data. Features are extracted from each of the heterogeneous auxiliary datasets. The features are combined with the input text data to generate a set of features which may potentially be used to classify the input text data. Classification features are then extracted from the set of features and used to classify the input text data. In one embodiment, the classification features are extracted by calculating a mutual information value associated with each feature in the set of features and identifying features having a mutual information value exceeding a threshold value.
机译:本发明提供了一种用于将来自异构辅助数据集的特征合并到输入文本数据中以用于分类的方法。接收输入文本数据后,可以访问异构辅助数据集,例如标记的数据集和未标记的数据集。从每个异构辅助数据集中提取特征。这些特征与输入文本数据相结合以生成一组特征,这些特征可以潜在地用于对输入文本数据进行分类。然后从一组特征中提取分类特征,并将其用于对输入文本数据进行分类。在一个实施例中,通过计算与一组特征中的每个特征相关联的互信息值并识别具有互信息值超过阈值的特征来提取分类特征。

著录项

  • 公开/公告号US8103671B2

    专利类型

  • 公开/公告日2012-01-24

    原文格式PDF

  • 申请/专利权人 RAKESH GUPTA;LEV RATINOV;

    申请/专利号US20080249809

  • 发明设计人 RAKESH GUPTA;LEV RATINOV;

    申请日2008-10-10

  • 分类号G06F7;G06F17/30;

  • 国家 US

  • 入库时间 2022-08-21 17:26:27

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号