...
首页> 外文期刊>BMC Medical Informatics and Decision Making >Estimating the re-identification risk of clinical data sets
【24h】

Estimating the re-identification risk of clinical data sets

机译:估计临床数据集的重新识别风险

获取原文
           

摘要

Background De-identification is a common way to protect patient privacy when disclosing clinical data for secondary purposes, such as research. One type of attack that de-identification protects against is linking the disclosed patient data with public and semi-public registries. Uniqueness is a commonly used measure of re-identification risk under this attack. If uniqueness can be measured accurately then the risk from this kind of attack can be managed. In practice, it is often not possible to measure uniqueness directly, therefore it must be estimated. Methods We evaluated the accuracy of uniqueness estimators on clinically relevant data sets. Four candidate estimators were identified because they were evaluated in the past and found to have good accuracy or because they were new and not evaluated comparatively before: the Zayatz estimator, slide negative binomial estimator, Pitman’s estimator, and mu-argus. A Monte Carlo simulation was performed to evaluate the uniqueness estimators on six clinically relevant data sets. We varied the sampling fraction and the uniqueness in the population (the value being estimated). The median relative error and inter-quartile range of the uniqueness estimates was measured across 1000 runs. Results There was no single estimator that performed well across all of the conditions. We developed a decision rule which selected between the Pitman, slide negative binomial and Zayatz estimators depending on the sampling fraction and the difference between estimates. This decision rule had the best consistent median relative error across multiple conditions and data sets. Conclusion This study identified an accurate decision rule that can be used by health privacy researchers and disclosure control professionals to estimate uniqueness in clinical data sets. The decision rule provides a reliable way to measure re-identification risk.
机译:背景信息身份公开是在公开临床数据以用于其他目的(例如研究)时保护患者隐私的一种常用方法。取消标识保护的一种攻击类型是将公开的患者数据与公共和半公共注册表关联起来。唯一性是这种攻击下重新识别风险的常用度量。如果可以准确地测量出唯一性,则可以管理这种攻击带来的风险。实际上,通常不可能直接测量唯一性,因此必须对其进行估计。方法我们根据临床相关数据集评估了唯一性估计量的准确性。确定了四个候选估计量,因为它们在过去曾被评估过并且发现具有较高的准确性,或者因为它们是新的而没有在之前进行过比较评估而没有被评估:Zayatz估计量,滑动负二项式估计量,Pitman估计量和mu-argus。进行了蒙特卡洛模拟,以评估六个临床相关数据集的唯一性估计量。我们改变了抽样比例和总体的唯一性(估计值)。在1000次运行中测量了唯一性估算值的中位数相对误差和四分位间距。结果在所有条件下都没有一个表现良好的估计器。我们制定了一个决策规则,根据采样比例和估计之间的差异,在Pitman,滑动负二项式估计和Zayatz估计之间进行选择。该决策规则在多个条件和数据集中具有最佳的一致中值相对误差。结论这项研究确定了一个准确的决策规则,健康隐私研究人员和披露控制专业人员可以使用该规则来估计临床数据集的唯一性。决策规则提供了一种可靠的方法来衡量重新识别风险。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号