首页> 外文期刊>BMC Medical Informatics and Decision Making >An efficient record linkage scheme using graphical analysis for identifier error detection
【24h】

An efficient record linkage scheme using graphical analysis for identifier error detection

机译:使用图形分析的有效记录链接方案用于标识符错误检测

获取原文
           

摘要

Background Integration of information on individuals (record linkage) is a key problem in healthcare delivery, epidemiology, and "business intelligence" applications. It is now common to be required to link very large numbers of records, often containing various combinations of theoretically unique identifiers, such as NHS numbers, which are both incomplete and error-prone. Methods We describe a two-step record linkage algorithm in which identifiers with high cardinality are identified or generated, and used to perform an initial exact match based linkage. Subsequently, the resulting clusters are studied and, if appropriate, partitioned using a graph based algorithm detecting erroneous identifiers. Results The system was used to cluster over 250 million health records from five data sources within a large UK hospital group. Linkage, which was completed in about 30 minutes, yielded 3.6 million clusters of which about 99.8% contain, with high likelihood, records from one patient. Although computationally efficient, the algorithm's requirement for exact matching of at least one identifier of each record to another for cluster formation may be a limitation in some databases containing records of low identifier quality. Conclusions The technique described offers a simple, fast and highly efficient two-step method for large scale initial linkage for records commonly found in the UK's National Health Service.
机译:背景技术关于个人的信息集成(记录链接)是医疗保健提供,流行病学和“商业智能”应用程序中的关键问题。现在通常需要链接非常多的记录,这些记录通常包含理论上唯一的标识符(例如NHS编号)的各种组合,这些标识符既不完整也不容易出错。方法我们描述了一种两步记录链接算法,其中识别或生成具有高基数的标识符,并用于执行基于初始精确匹配的链接。随后,研究所得的群集,并在适当时使用基于图形的检测错误标识符的算法对群集进行分区。结果该系统被用来对来自英国一家大型医院集团的五个数据源的2.5亿条健康记录进行聚类。链接在大约30分钟内完成,产生了360万个簇,其中大约99.8%包含着一名患者的记录。尽管计算效率高,但是该算法要求将每个记录的至少一个标识符与另一个记录进行精确匹配以进行聚类形成,这可能在某些包含低标识符质量的记录的数据库中是一个限制。结论所描述的技术提供了一种简单,快速,高效的两步方法,可用于大规模初始链接,以获取英国国家卫生服务局(National Health Service)常见的记录。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号