Deduplication Algorithms for DataBases and Data Warehouses

机译：数据库和数据仓库的重复数据删除算法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data deduplication is a very important step in the process of heterogeneous data integration. It ensures a better quality of data results. Thus, the extraction of knowledge from these data is more accurate. We present in this paper, two sequential algorithms, improvements over Swoosh algorithms, to eliminate similar data. These algorithms are based on the Match and Merge functions that we have denned. The Match function is based on calculations of similarity distances depending on the type of data. The Merge function uses logical rules. We experimentally evaluate the algorithms using a random generated data.

机译：重复数据删除是异构数据集成过程中非常重要的一步。它确保了更好的数据结果质量。因此，从这些数据中提取知识更为准确。我们在本文中提出了两种顺序算法，它们是对Swoosh算法的改进，以消除相似数据。这些算法基于我们已经定义的Match和Merge函数。匹配功能基于根据数据类型的相似距离的计算。合并功能使用逻辑规则。我们使用随机生成的数据实验性地评估算法。

著录项

来源
《21st international conference on software engineering and data engineering 2012》|2012年|73-78|共6页
会议地点 Los Angeles CA(US)
作者
F. BOUFARES; A. BEN SALEM; S. CORREIA;
展开▼
作者单位

Laboratory LIPN - UMR 7030 - CNRS University Paris 13 Villetaneuse, 93430, France;

Laboratory LIPN - UMR 7030 - CNRS University Paris 13 Villetaneuse, 93430, France;

Team RD Company Talend Suresnes, 92150, France;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. The Cardiac Safety Research Consortium electrocardiogram warehouse: thorough QT database specifications and principles of use for algorithm development and testing. [J] . Kligfield P, Green CL, Mortara J, The American heart journal . 2010,第6期

机译：心脏安全研究协会心电图仓库：完整的QT数据库规范以及算法开发和测试的使用原则。
2. The It! Knowledge Warehouse? : Large-Scale Concept-Response Databases Using Conjoint Analysis, Segmentation and Databasing for Development and Marketing [J] . Howard R. Moskowitz, Jacqueline Beckley, Teri Curran Mascuch Journal of Food Technology . 2007,第1期

机译：它！知识仓库？：使用联合分析，细分和数据库基础进行开发和营销的大型概念响应数据库
3. Deduplication in the Backup System with Information Storage in a Database [J] . S. M. Taranin Automatic Control and Computer Sciences . 2018,第7期

机译：在数据库中具有信息存储的备份系统中的重复数据删除
4. Deduplication Algorithms for DataBases and Data Warehouses [C] . F. BOUFARES, A. BEN SALEM, S. CORREIA International conference on software engineering and data engineering . 2012

机译：数据库和数据仓库的重复数据删除算法
5. Online Deduplication for Distributed Databases. [D] . Xu, Lianghong. 2016

机译：分布式数据库的在线重复数据删除。
6. Rule-based deduplication of article records from bibliographic databases [O] . Yu Jiang, Can Lin, Weiyi Meng, 2014

机译：从书目数据库对文章记录进行基于规则的重复数据删除
7. Nutzung von Datenbankdiensten in Data-Warehouse-Anwendungen (Connecting Data Warehouse Applications with Database Services) [O] . Lutz Schlesinger, Wolfgang Lehner, Wolfgang Hümmer, 2003

机译：在数据仓库应用程序中使用数据库服务（使用数据库服务连接数据仓库应用程序）

Deduplication Algorithms for DataBases and Data Warehouses

摘要

著录项

相似文献

相关主题

期刊订阅