A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics

Joe Tekli; Richard Chbeir

首页> 外文期刊>Journal of web semantics: >A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics

【24h】

A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics

机译：基于子树共性和标签语义的新型XML文档结构比较框架

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

XML similarity evaluation has become a central issue in the database and information communities, its applications ranging over document clustering, version control, data integration and ranked retrieval. Various algorithms for comparing hierarchically structured data, XML documents in particular, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being commonly modeled as Ordered Labeled Trees. Yet, a thorough investigation of current approaches led us to identify several similarity aspects, i.e., sub-tree related structural and semantic similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an integrated and fine-grained comparison framework to deal with both structural and semantic similarities in XML documents (detecting the occurrences and repetitions of structurally and semantically similar sub-trees), and to allow the end-user to adjust the comparison process according to her requirements. Our framework consists of four main modules for (i) discovering the structural commonalities between sub-trees, (ii) identifying sub-tree semantic resemblances, (iii) computing tree-based edit operations costs, and (iv) computing tree edit distance. Experimental results demonstrate higher comparison accuracy with respect to alternative methods, while timing experiments reflect the impact of semantic similarity on overall system performance.

机译：XML相似性评估已成为数据库和信息社区中的中心问题，其应用范围涵盖文档集群，版本控制，数据集成和排名检索。在文献中已经提出了用于比较分层结构的数据，尤其是XML文档的各种算法。它们中的大多数都利用查找树结构之间的编辑距离的技术，通常将XML文档建模为有序标签树。然而，对当前方法的透彻研究使我们发现了几个相似性方面，即与子树相关的结构和语义相似性，在比较XML文档时这些不足之处尚未得到充分解决。在本文中，我们提供了一个集成且细粒度的比较框架，以处理XML文档中的结构和语义相似性（检测结构和语义上相似的子树的出现和重复），并允许最终用户进行调整根据她的要求进行比较。我们的框架由四个主要模块组成：（i）发现子树之间的结构共性，（ii）识别子树的语义相似度，（iii）计算基于树的编辑操作成本，以及（iv）计算树的编辑距离。实验结果表明，相对于替代方法，比较精度更高，而时序实验则反映了语义相似性对整体系统性能的影响。

著录项

来源
《Journal of web semantics:》 |2012年第3期|p.14-40|共27页
作者
Joe Tekli; Richard Chbeir;
展开▼
作者单位

ICMC Computer Science and Statistics Institute, University of Sao Paulo, 13566-590 Sao Carlos, SP, Brazil;

LE21 Laboratory UMR-CNRS, University of Bourgogne, 21078 Dijon Cedex, France;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
XML (semi-structured) data; structural similarity; tree edit distance; semantic similarity; information retrieval; vector space model;

机译：XML（半结构化）数据;结构相似度;树编辑距离;语义相似度;信息检索;向量空间模型;

相似文献

外文文献
中文文献
专利

1. Semantics expression of multimedia structured document STOIC with XML and its estimation [J] . Kenji Echizennya, Kazuaki Matsuda, Takashi Tomii, 電子情報通信学会技術研究報告. デ-タ工学. Data Engineering . 2001,第191期

机译：具有XML的多媒体结构文档STOIC的语义表达及其估计。
2. Semantics expression of multimedia structured document STOIC with XML and its estimation [J] . Kenji Echizennya, Kazuaki Matsuda, Takashi Tomii, 電子情報通信学会技術研究報告. デ-タ工学. Data Engineering . 2001,第191期

机译：XML的多媒体结构文献StOC的语义表达及其估算
3. A framework for multi-document abstractive summarization based on semantic role labelling [J] . Khan Atif, Salim Naomie, Kumar Yogan Jaya Applied Soft Computing . 2015,第Null期

机译：基于语义角色标记的多文档抽象总结框架
4. Storage Method of XML Documents Based-on the Pri-order Labeling Scheme [C] . Liwen Yue, Jiadong Ren, Ying Qian International Workshop on Education Technology and Computer Science . 2009

机译：基于PRI阶标签方案的XML文档的存储方法
5. Comparing XML Documents as Reference-aware Labeled Ordered Trees. [D] . Mikhaiel, Rimon A. E. 2011

机译：将XML文档比较为参考感知的带标签的有序树。
6. A Preliminary Analysis of XMLs Potential Role in Representing the Semantics and Structure of the Oncology Patient Record [O] . Catherine Arnott Smith, Henry J. Lowe 2001

机译：XML在表示肿瘤患者病历的语义和结构中的潜在作用的初步分析
7. A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics [O] . Tekli Joe, Chbeir Richard 2012

机译：基于子树共性和标签语义的新型XML文档结构比较框架

A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics

摘要

著录项

相似文献

相关主题

期刊订阅