...
首页> 外文期刊>Journal of web semantics: >A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics
【24h】

A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics

机译:基于子树共性和标签语义的新型XML文档结构比较框架

获取原文
获取原文并翻译 | 示例
           

摘要

XML similarity evaluation has become a central issue in the database and information communities, its applications ranging over document clustering, version control, data integration and ranked retrieval. Various algorithms for comparing hierarchically structured data, XML documents in particular, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being commonly modeled as Ordered Labeled Trees. Yet, a thorough investigation of current approaches led us to identify several similarity aspects, i.e., sub-tree related structural and semantic similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an integrated and fine-grained comparison framework to deal with both structural and semantic similarities in XML documents (detecting the occurrences and repetitions of structurally and semantically similar sub-trees), and to allow the end-user to adjust the comparison process according to her requirements. Our framework consists of four main modules for (i) discovering the structural commonalities between sub-trees, (ii) identifying sub-tree semantic resemblances, (iii) computing tree-based edit operations costs, and (iv) computing tree edit distance. Experimental results demonstrate higher comparison accuracy with respect to alternative methods, while timing experiments reflect the impact of semantic similarity on overall system performance.
机译:XML相似性评估已成为数据库和信息社区中的中心问题,其应用范围涵盖文档集群,版本控制,数据集成和排名检索。在文献中已经提出了用于比较分层结构的数据,尤其是XML文档的各种算法。它们中的大多数都利用查找树结构之间的编辑距离的技术,通常将XML文档建模为有序标签树。然而,对当前方法的透彻研究使我们发现了几个相似性方面,即与子树相关的结构和语义相似性,在比较XML文档时这些不足之处尚未得到充分解决。在本文中,我们提供了一个集成且细粒度的比较框架,以处理XML文档中的结构和语义相似性(检测结构和语义上相似的子树的出现和重复),并允许最终用户进行调整根据她的要求进行比较。我们的框架由四个主要模块组成:(i)发现子树之间的结构共性,(ii)识别子树的语义相似度,(iii)计算基于树的编辑操作成本,以及(iv)计算树的编辑距离。实验结果表明,相对于替代方法,比较精度更高,而时序实验则反映了语义相似性对整体系统性能的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号