首页> 外文期刊>BMC Medical Informatics and Decision Making >Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, Brazil
【24h】

Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, Brazil

机译:在巴西里约热内卢初级护理数据密集型评估下的次优条件下的纪录

获取原文
           

摘要

Linking Brazilian databases demands the development of algorithms and processes to deal with various challenges including the large size of the databases, the low number and poor quality of personal identifiers available to be compared (national security number not mandatory), and some characteristics of Brazilian names that make the linkage process prone to errors. This study aims to describe and evaluate the quality of the processes used to create an individual-linked database for data-intensive research on the impacts on health indicators of the expansion of primary care in Rio de Janeiro City, Brazil. We created an individual-level dataset linking social benefits recipients, primary health care, hospital admission and mortality data. The databases were pre-processed, and we adopted a multiple approach strategy combining deterministic and probabilistic record linkage techniques, and an extensive clerical review of the potential matches. Relying on manual review as the gold standard, we estimated the false match (false-positive) proportion of each approach (deterministic, probabilistic, clerical review) and the missed match proportion (false-negative) of the clerical review approach. To assess the sensitivity (recall) to identifying social benefits recipients’ deaths, we used their vital status registered on the primary care database as the gold standard. In all linkage processes, the deterministic approach identified most of the matches. However, the proportion of matches identified in each approach varied. The false match proportion was around 1% or less in almost all approaches. The missed match proportion in the clerical review approach of all linkage processes were under 3%. We estimated a recall of 93.6% (95% CI 92.8–94.3) for the linkage between social benefits recipients and mortality data. The adoption of a linkage strategy combining pre-processing routines, deterministic, and probabilistic strategies, as well as an extensive clerical review approach minimized linkage errors in the context of suboptimal data quality.
机译:链接巴西数据库需要开发算法和过程,以处理包括大尺寸数据库的各种挑战,可以比较的个人标识符的低数量和差的质量(国家安全号码而非强制性),以及巴西名称的一些特征使联动过程容易出错。本研究旨在描述和评估用于创建个人联系数据库的流程的质量,以便数据密集型研究对Rio de Janeiro City,巴西里约热内卢市扩建初级保健的影响。我们创建了一个连接社会福利收件人,初级保健,医院入学和死亡数据的个人级数据集。数据库被预处理,我们采用了多种方法策略结合确定性和概率记录连锁技术,以及对潜在匹配的广泛文书审查。依靠手工评论作为黄金标准,我们估计了每种方法的假匹配(假阳性)比例(确定性,概率,文教审查)和涉行审查方法的错过匹配比例(假阴性)。为了评估识别社会福利接受者死亡的敏感性(召回),我们使用他们在初级保健数据库上注册的重要地位作为黄金标准。在所有联动过程中,确定性方法都识别了大多数匹配。然而,各种方法中鉴定的比例的比例变化。在几乎所有方法中,假匹配比例大约为1%或更少。所有联动过程的职员审查方法中错过的匹配比例低于3%。我们估计召回93.6%(95%CI 92.8-94.3),用于社会福利受体和死亡率数据之间的联系。采用连锁战略结合预处理例程,确定性和概率策略,以及广泛的职员审查方法在次优数据质量的上下文中最小化了联动误差。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号