首页> 外国专利> Predicting data for document attributes based on aggregated data for repeated URL patterns

Predicting data for document attributes based on aggregated data for repeated URL patterns

机译:根据重复URL模式的汇总数据预测文档属性的数据

摘要

One or more hierarchies of string patterns are generated a plurality of URL strings according to a pattern extraction procedure. Repeated string patterns are selected from the generated hierarchies of string patterns. A URL class is defined for each of selected repeated string patterns. Each URL class is associated with a respective group of URL strings in the plurality of URL strings, where the respective group of URL strings contains a repeated string pattern that defines the URL class. Respective aggregated data is calculated for each URL class. The respective aggregated data is based on respective data of each respective document of each URL string in the group of URL strings associated with the URL class. Respective data for a respective document referenced by a lookup-URL is predicted based on respective aggregated data of one or more of the URL classes.
机译:根据模式提取过程,在多个URL字符串中生成一个或多个字符串模式层次结构。从生成的字符串模式层次结构中选择重复的字符串模式。为每个选定的重复字符串模式定义一个URL类。每个URL类别与多个URL字符串中的各个URL字符串组相关联,其中各个URL字符串组包含定义URL类别的重复字符串模式。将为每个URL类计算相应的汇总数据。各个聚合数据基于与URL类别相关联的URL字符串组中每个URL字符串的每个相应文档的各个数据。根据一个或多个URL类的相应汇总数据,预测由lookup-URL引用的相应文档的相应数据。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号