首页> 外文学位 >A new hierarchical clustering model for speeding up the reconciliation of XML-based, semistructured data in mediation systems.

【24h】

A new hierarchical clustering model for speeding up the reconciliation of XML-based, semistructured data in mediation systems.

机译：一种新的层次集群模型，用于加快中介系统中基于XML的半结构化数据的协调。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This dissertation describes the underlying research, design and implementation for a Data Merge Engine (DME). Specifically, we have developed a hierarchical clustering model as a new solution to speed up the merging of similar and overlapping data items from multiple information sources. We use a tree-based heuristic algorithm for clustering data in a multi-dimensional metric space. Equivalence of data objects within the individual clusters is determined using a number of distance functions that calculate the semantic distances among the objects based on their attribute values. Because of the diversity of numbers of data items to be compared, we have developed a set of heuristics to appropriately reconcile data items. The experimental results show that our approach is more efficient and provides more accurate results when compared with other existing approaches.; Given the immense popularity of the World Wide Web (Web), we focus mainly on reconciling semistructured data. Specifically, we use the Extensible Markup Language (XML) as our internal data model for representing heterogeneous data. As part of our research, we have developed a comprehensive classification for schematic and semantic conflicts that can occur when merging data from related XML-based information sources.; The research proposed here is conducted within the context of the Integration Wizard (IWIZ) system, which allows users to access and retrieve information from multiple sources through a consistent, integrated view. To improve query response time, IWIZ uses a combined mediation/data warehousing approach to information integration.

机译：本文描述了数据合并引擎（DME）的基础研究，设计和实现。具体来说，我们已经开发了一种层次化的聚类模型，作为一种新的解决方案，可以加快来自多个信息源的相似和重叠数据项的合并。我们使用基于树的启发式算法对多维度量空间中的数据进行聚类。使用多个距离函数确定各个群集中数据对象的等效性，这些距离函数根据对象的属性值计算对象之间的语义距离。由于要比较的数据项数量不同，我们开发了一套启发式方法来适当协调数据项。实验结果表明，与其他现有方法相比，我们的方法效率更高，结果更准确。鉴于万维网（Web）的巨大普及，我们主要集中在协调半结构化数据上。具体来说，我们使用可扩展标记语言（XML）作为内部数据模型来表示异构数据。作为我们研究的一部分，我们为合并来自相关XML信息源的数据时可能发生的示意图和语义冲突建立了全面的分类。此处提出的研究是在集成向导（IWIZ）系统的上下文中进行的，该系统允许用户通过一致的集成视图访问和检索来自多个源的信息。为了缩短查询响应时间，IWIZ使用组合的中介/数据仓库方法进行信息集成。

著录项

作者
Pluempitiwiriyawej, Charnyote.;
展开▼
作者单位

University of Florida.;

展开▼
授予单位 University of Florida.;
学科 Computer Science.
学位 Ph.D.
年度 2001
页码 122 p.
总页数 122
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. CLAIM (CLinical Accounting InforMation)--an XML-based data exchange standard for connecting electronic medical record systems to patient accounting systems. [J] . Guo J, Takada A, Tanaka K, Journal of medical systems . 2005,第4期

机译：CLAIM（临床会计信息）-一种基于XML的数据交换标准，用于将电子病历系统连接到患者会计系统。
2. Modelling the hierarchical structure in datasets with very small clusters: A simulation study to explore the effect of the proportion of clusters when the outcome is continuous [J] . SauzetO., WrightK.C., MarstonL., Statistics in medicine . 2013,第8期

机译：对具有非常小的聚类的数据集中的层次结构进行建模：一项模拟研究，用于探索结果连续时聚类所占比例的影响
3. Optimal Schema Hierarchies in Searching Semistructured Databases by Conjunctive Regular Path Queries [J] . S. S. Gorelov Programming and Computer Software . 2006,第4期

机译：通过合取正则路径查询搜索半结构化数据库中的最佳模式层次结构
4. Data Bubbles for Non-Vector Data: Speeding-up Hierarchical Clustering in Arbitrary Metric Spaces [C] . Jianjun Zhou, Joerg Sander Twenty-ninth International Conference on Very Large Databases; Sep 9-12, 2003; Berlin, Germany . 2003

机译：非矢量数据的数据气泡：任意度量空间中的分层聚类加速
5. Bayesian analysis of hierarchical models for polychotomous data from a multistage cluster sample. [D] . Schuckers, Michael Edward. 1999

机译：来自多阶段集群样本的多选数据的层次模型的贝叶斯分析。
6. Bayesian hierarchical spatial count modeling of taxi speeding events based on GPS trajectory data [O] . Haiyue Liu, Chuanyun Fu, Chaozhe Jiang, 2020

机译：基于GPS轨迹数据的出租车超速事件的贝叶斯分层空间计数建模
7. Bayesian hierarchical spatial count modeling of taxi speeding events based on GPS trajectory data [O] . Haiyue Liu, Chuanyun Fu, Chaozhe Jiang, 2020

机译：基于GPS轨迹数据的出租车超速事件的贝叶斯分层空间计数建模

A new hierarchical clustering model for speeding up the reconciliation of XML-based, semistructured data in mediation systems.

摘要

著录项

相似文献

相关主题

期刊订阅