首页>
外国专利>
Predicting data for document attributes based on aggregated data for repeated URL patterns
Predicting data for document attributes based on aggregated data for repeated URL patterns
展开▼
机译:根据重复URL模式的汇总数据预测文档属性的数据
展开▼
页面导航
摘要
著录项
相似文献
摘要
One or more hierarchies of string patterns are generated a plurality of URL strings according to a pattern extraction procedure. Repeated string patterns are selected from the generated hierarchies of string patterns. A URL class is defined for each of selected repeated string patterns. Each URL class is associated with a respective group of URL strings in the plurality of URL strings, where the respective group of URL strings contains a repeated string pattern that defines the URL class. Respective aggregated data is calculated for each URL class. The respective aggregated data is based on respective data of each respective document of each URL string in the group of URL strings associated with the URL class. Respective data for a respective document referenced by a lookup-URL is predicted based on respective aggregated data of one or more of the URL classes.
展开▼