首页> 外国专利> DISTRIBUTED METHOD FOR INTEGRATING DATA MINING AND TEXT CATEGORIZATION TECHNIQUES

DISTRIBUTED METHOD FOR INTEGRATING DATA MINING AND TEXT CATEGORIZATION TECHNIQUES

机译:集成数据挖掘和文本分类技术的分布式方法

摘要

A method for prediction analysis using text categorization is provided. The method includes the steps of: grouping a plurality of text documents into a plurality of classes; selecting a top m most discriminatory terms for each class of documents using statistical based measures; determining for each document the presence or absence of each of the discriminatory terms, learning rule-based models of each class of documents using a rule learning algorithm; determining, for at least a portion of the plurality of documents, if a given learned rule has been satisfied by each respective document; creating a database of the rules associated with documents satisfying the rules; and performing distributed data mining to form a predictive result based on at least a portion of the plurality of documents.
机译:提供了一种使用文本分类进行预测分析的方法。该方法包括以下步骤:将多个文本文档分组为多个类别;以及将多个文本文档分组为多个类别。使用基于统计的度量为每类文档选择最重要的歧视性术语;为每个文档确定每个歧视性术语的存在与否,使用规则学习算法学习每个文档类别的基于规则的模型;对于所述多个文档中的至少一部分,确定每个相应文档是否已经满足给定的学习规则;创建与满足该规则的文档相关联的规则的数据库;以及基于所述多个文档的至少一部分执行分布式数据挖掘以形成预测结果。

著录项

  • 公开/公告号WO2008042264A3

    专利类型

  • 公开/公告日2008-07-24

    原文格式PDF

  • 申请/专利权人 INFERX CORPORATION;HADJARIAN ALI;

    申请/专利号WO2007US20938

  • 发明设计人 HADJARIAN ALI;

    申请日2007-09-28

  • 分类号G06E1;

  • 国家 WO

  • 入库时间 2022-08-21 20:00:06

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号