首页> 外文学位 >A new hierarchical clustering model for speeding up the reconciliation of XML-based, semistructured data in mediation systems.
【24h】

A new hierarchical clustering model for speeding up the reconciliation of XML-based, semistructured data in mediation systems.

机译:一种新的层次集群模型,用于加快中介系统中基于XML的半结构化数据的协调。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation describes the underlying research, design and implementation for a Data Merge Engine (DME). Specifically, we have developed a hierarchical clustering model as a new solution to speed up the merging of similar and overlapping data items from multiple information sources. We use a tree-based heuristic algorithm for clustering data in a multi-dimensional metric space. Equivalence of data objects within the individual clusters is determined using a number of distance functions that calculate the semantic distances among the objects based on their attribute values. Because of the diversity of numbers of data items to be compared, we have developed a set of heuristics to appropriately reconcile data items. The experimental results show that our approach is more efficient and provides more accurate results when compared with other existing approaches.; Given the immense popularity of the World Wide Web (Web), we focus mainly on reconciling semistructured data. Specifically, we use the Extensible Markup Language (XML) as our internal data model for representing heterogeneous data. As part of our research, we have developed a comprehensive classification for schematic and semantic conflicts that can occur when merging data from related XML-based information sources.; The research proposed here is conducted within the context of the Integration Wizard (IWIZ) system, which allows users to access and retrieve information from multiple sources through a consistent, integrated view. To improve query response time, IWIZ uses a combined mediation/data warehousing approach to information integration.
机译:本文描述了数据合并引擎(DME)的基础研究,设计和实现。具体来说,我们已经开发了一种层次化的聚类模型,作为一种新的解决方案,可以加快来自多个信息源的相似和重叠数据项的合并。我们使用基于树的启发式算法对多维度量空间中的数据进行聚类。使用多个距离函数确定各个群集中数据对象的等效性,这些距离函数根据对象的属性值计算对象之间的语义距离。由于要比较的数据项数量不同,我们开发了一套启发式方法来适当协调数据项。实验结果表明,与其他现有方法相比,我们的方法效率更高,结果更准确。鉴于万维网(Web)的巨大普及,我们主要集中在协调半结构化数据上。具体来说,我们使用可扩展标记语言(XML)作为内部数据模型来表示异构数据。作为我们研究的一部分,我们为合并来自相关XML信息源的数据时可能发生的示意图和语义冲突建立了全面的分类。此处提出的研究是在集成向导(IWIZ)系统的上下文中进行的,该系统允许用户通过一致的集成视图访问和检索来自多个源的信息。为了缩短查询响应时间,IWIZ使用组合的中介/数据仓库方法进行信息集成。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号