...
首页> 外文期刊>Expert systems with applications >Web taxonomy integration with hierarchical shrinkage algorithm and fine-grained relations
【24h】

Web taxonomy integration with hierarchical shrinkage algorithm and fine-grained relations

机译:具有分类收缩算法和细粒度关系的Web分类法集成

获取原文
获取原文并翻译 | 示例
           

摘要

We address the problem of integrating web taxonomies from different real Internet applications. Integrating web taxonomies is to transfer instances from a source to target taxonomy. Unlike the conventional text categorization problem, in taxonomy integration, the source taxonomy contains extra information that can be used to improve the categorization. The major existing methods can be divided in two types: those that use neighboring categories to smooth the document term vector and those that consider the semantic relationship between corresponding categories of the target and source taxonomies to facilitate categorization. In contrast to the first type of approach, which only uses a flattened hierarchy for smoothing, we apply a hierarchy shrinkage algorithm to smooth child documents by their parents. We also discuss the effect of using different hierarchical levels for smoothing. To extend the second type of approach, we extract fine-grain semantic relationships, which consider the relationships between lower-level categories. In addition, we use the cosine similarity to measure the semantic relationships, which achieves better performance than existing methods. Finally, we integrate the existing approaches and the proposed methods into one machine learning model to find the best feature configuration. The results of experiments on real Internet data demonstrate that our system outperforms standard text classifiers by about 10%.
机译:我们解决了集成来自不同实际Internet应用程序的Web分类法的问题。集成Web分类法是将实例从源转移到目标分类法。与常规的文本分类问题不同,在分类法集成中,源分类法包含可用于改进分类的额外信息。现有的主要方法可以分为两种:使用相邻类别平滑文档术语向量的方法以及考虑目标分类法和源分类法的相应类别之间的语义关系以促进分类的方法。与仅使用扁平化的层次结构进行平滑处理的第一种方法相反,我们采用层次结构收缩算法来通过其父级对子文档进行平滑处理。我们还将讨论使用不同的层次级别进行平滑的效果。为了扩展第二种方法,我们提取了细粒度的语义关系,该关系考虑了较低级别类别之间的关系。另外,我们使用余弦相似度来度量语义关系,这比现有方法具有更好的性能。最后,我们将现有方法和提出的方法集成到一个机器学习模型中,以找到最佳的特征配置。对真实Internet数据进行的实验结果表明,我们的系统比标准文本分类器的性能高出约10%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号