首页> 外文会议>International Conference on Environmental and Computer Science >An Algorithm for detecting similar data in replicated databases using Multi Criteria decision making
【24h】

An Algorithm for detecting similar data in replicated databases using Multi Criteria decision making

机译:一种使用多标准决策检测复制数据库中类似数据的算法

获取原文

摘要

Identical data may cause many problems in all types of databases, specially distributed and replicated databases. These data will attack consistency and redundancy which are two important problems in databases. Databases or replicas may contain similar records with different appearance, concerning the same real word entity because of many reasons. Some of these reasons are: Entry errors, unstandardized abbreviations, differences details of various databases schemas, package lost, noisy environments and etc are some reasons of duplicates. This paper proposes an approach to detect duplicate or similar data, which are faulty or noisy so they are distinguished as different data, among various replicas in distributed or replicated databases. Multi criteria decision making algorithm is employed for this propose. To detect identical records, at first step some priorities are defined for fields and then percent of similarity of records evaluate. Algorithm's time overhead is improved through using special order of priorities. Multi criteria decision making algorithm is used to decide how to combine records with each other and which record is complete and true one. An instance based learning approach is employed to learn how to set priorities for various fields, creating a uniform schema and find their appropriate match, in other replica.
机译:相同的数据可能导致所有类型的数据库中的许多问题,特殊分布式和复制的数据库。这些数据将攻击一致性和冗余,这是数据库中的两个重要问题。数据库或副本可能包含具有不同外观的类似记录,同样是相同的真实单词实体,因为许多原因。其中一些原因是:进入错误,未标准化的缩写,各种数据库模式的差异细节,包丢失,嘈杂的环境等的一些原因是重复的原因。本文提出了一种检测重复或类似数据的方法,这些数据是有缺陷的或嘈杂的,因此它们在分布式或复制数据库中的各种副本中区分为不同的数据。该提议采用多标准决策算法。要检测到相同的记录,在第一步中为字段定义了一些优先级,并且记录的相似性百分比评估。通过使用优先级的特殊顺序,通过特殊顺序提高了算法的时间开销。多标准决策算法用于决定如何将记录彼此组合,并且记录是完整的并且真实的。用于基于实例的学习方法来学习如何为各种字段设置优先级,在其他副本中创建统一的架构并找到合适的匹配。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号