首页> 中文期刊> 《江苏师范大学学报(自然科学版)》 >基于二次模糊评判的相似重复记录检测方法

基于二次模糊评判的相似重复记录检测方法

         

摘要

A large number of approximately and duplicated records are produced during the database integration,and the field matching algorithm is one of the main methods to detect and clean them.Aiming at the problem that the grading method of attribute weight is too subjective,an improved detection method based on twice fuzzy evaluation is put forward.Firstly,according to the grading method,remove some unimportant attributes which are at lower levels by the first judgement.Secondly,another fuzzy evaluation is given on the remaining attributes.Then,the at-tribute weights are obtained by averaging the attributes grade.Finally,the data sets are grouped,and parallelled in each data set to detect approximately duplicated records.Theoretical analysis and experimental results show that the method not only improves the efficiency,but can further improve the precision and recall.%数据库集成时会产生大量的相似、重复记录,字段匹配算法是对其进行检测并清洗的主要方法之一。针对等级法确定属性权值主观性过强的问题,提出改进的基于二次模糊评判的检测方法。根据等级法对属性进行第一次评判,剔除等级低的部分非重要属性;对剩余属性进行二次模糊评判,平均属性等级评判的结果,确定属性权值,然后对数据集进行分组,并在各个数据集中检测相似重复记录。理论分析和实验结果表明,该方法不仅提高了运行效率,而且可以进一步提高查重的查准率和查全率。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号