首页> 外文会议>Medical Informatics in Europe Conference. >Blinded Anonymization: a method for evaluating cancer prevention programs under restrictive data protection regulations
【24h】

Blinded Anonymization: a method for evaluating cancer prevention programs under restrictive data protection regulations

机译:盲目的匿名化:一种评估限制性数据保护法规下的癌症预防计划的方法

获取原文

摘要

Evaluating cancer prevention programs requires collecting and linking data on a case specific level from multiple sources of the healthcare system. Therefore, one has to comply with data protection regulations which are restrictive in Germany and will likely become stricter in Europe in general. To facilitate the mortality evaluation of the German mammography screening program, with more than 10 Million eligible women, we developed a method that does not require written individual consent and is compliant to existing privacy regulations. Our setup is composed of different data owners, a data collection center (DCC) and an evaluation center (EC). Each data owner uses a dedicated software that pre-processes plain-text personal identifiers (IDAT) and plaintext evaluation data (EDAT) in such a way that only irreversibly encrypted record assignment numbers (RAN) and pre-aggregated, reversibly encrypted EDAT are transmitted to the DCC. The DCC uses the RANs to perform a probabilistic record linkage which is based on an established and evaluated algorithm. For potentially identifying attributes within the EDAT ('quasi-identifiers'), we developed a novel process, named 'blinded anonymization'. It allows selecting a specific generalization from the pre-processed and encrypted attribute aggregations, to create a new data set with assured k-anonymity, without using any plain-text information. The anonymized data is transferred to the EC where the EDAT is decrypted and used for evaluation. Our concept was approved by German data protection authorities. We implemented a prototype and tested it with more than 1.5 Million simulated records, containing realistically distributed IDAT. The core processes worked well with regard to performance parameters. We created different generalizations and calculated the respective suppression rates. We discuss modalities, implications and limitations for large data sets in the cancer registry domain, as well as approaches for further improvements like 1-diversity and automatic computation of 'optimal' generalizations.
机译:评估癌症预防计划需要从医疗保健系统的多个来源收集和链接数据特定水平。因此,人们必须遵守德国限制的数据保护规定,一般可能变得更加严格。为了促进德国乳房X线摄影筛查计划的死亡率评估,符合人资格超过1000万符号妇女,我们开发了一种不需要书面个人同意的方法,并符合现行隐私法规。我们的设置由不同的数据所有者,数据收集中心(DCC)和评估中心(EC)组成。每个数据所有者都使用专用软件来预先处理普通文本个人标识符(IDAT)和明文评估数据(EDAT)的方式,即仅传输不可逆转地加密的记录分配号(RAN)和预聚合,可逆加密的EDAT到DCC。 DCC使用RAN执行基于建立和评估算法的概率记录链接。对于潜在地识别EDAT中的属性('准标识符'),我们开发了一种新颖的过程,名为“盲目的匿名化”。它允许从预处理和加密的属性聚合中选择特定的泛化,以创建具有保证k-匿名的新数据集,而不使用任何普通文本信息。匿名数据被传输到EC,其中edat解密并用于评估。我们的概念被德国数据保护当局批准。我们实现了一种原型并使用了超过150万的模拟记录进行了测试,其中包含了现实分布的IDAT。核心过程在性能参数方面工作得很好。我们创建了不同的概括并计算了相应的抑制率。我们讨论癌症注册机构域中的大数据集的方式,影响和限制,以及进一步改进的方法,如十分之一和自动计算“最佳”概括。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号