首页> 外文期刊>JMIR Medical Informatics >Evaluation of the Privacy Risks of Personal Health Identifiers and Quasi-Identifiers in a Distributed Research Network: Development and Validation Study
【24h】

Evaluation of the Privacy Risks of Personal Health Identifiers and Quasi-Identifiers in a Distributed Research Network: Development and Validation Study

机译:评估分布式研究网络中个人健康标识符和准标识符的隐私风险:开发和验证研究

获取原文
       

摘要

Background Privacy should be protected in medical data that include patient information. A distributed research network (DRN) is one of the challenges in privacy protection and in the encouragement of multi-institutional clinical research. A DRN standardizes multi-institutional data into a common structure and terminology called a common data model (CDM), and it only shares analysis results. It is necessary to measure how a DRN protects patient information privacy even without sharing data in practice. Objective This study aimed to quantify the privacy risk of a DRN by comparing different deidentification levels focusing on personal health identifiers (PHIs) and quasi-identifiers (QIs). Methods We detected PHIs and QIs in an Observational Medical Outcomes Partnership (OMOP) CDM as threatening privacy, based on 18 Health Insurance Portability and Accountability Act of 1996 (HIPPA) identifiers and previous studies. To compare the privacy risk according to the different privacy policies, we generated limited and safe harbor data sets based on 16 PHIs and 12 QIs as threatening privacy from the Synthetic Public Use File 5 Percent (SynPUF5PCT) data set, which is a public data set of the OMOP CDM. With minimum cell size and equivalence class methods, we measured the privacy risk reduction with a trust differential gap obtained by comparing the two data sets. We also measured the gap in randomly sampled records from the two data sets to adjust the number of PHI or QI records. Results The gaps averaged 31.448% and 73.798% for PHIs and QIs, respectively, with a minimum cell size of one, which represents a unique record in a data set. Among PHIs, the national provider identifier had the highest gap of 71.236% (71.244% and 0.007% in the limited and safe harbor data sets, respectively). The maximum size of the equivalence class, which has the largest size of an indistinguishable set of records, averaged 771. In 1000 random samples of PHIs, Device_exposure_start_date had the highest gap of 33.730% (87.705% and 53.975% in the data sets). Among QIs, Death had the highest gap of 99.212% (99.997% and 0.784% in the data sets). In 1000, 10,000, and 100,000 random samples of QIs, Device_treatment had the highest gaps of 12.980% (99.980% and 87.000% in the data sets), 60.118% (99.831% and 39.713%), and 93.597% (98.805% and 5.207%), respectively, and in 1 million random samples, Death had the highest gap of 99.063% (99.998% and 0.934% in the data sets). Conclusions In this study, we verified and quantified the privacy risk of PHIs and QIs in the DRN. Although this study used limited PHIs and QIs for verification, the privacy limitations found in this study could be used as a quality measurement index for deidentification of multi-institutional collaboration research, thereby increasing DRN safety.
机译:背景隐私应保护在包括患者信息的医疗数据中。分布式研究网络(DRN)是隐私保护的挑战之一,并在鼓励多制度临床研究中。 DRN将多机构数据标准化为常见的结构和术语,称为常见的数据模型(CDM),并且它只共享分析结果。有必要衡量DRN如何保护患者信息隐私,即使在实际情况下也不共享数据。目的本研究旨在通过比较关注个人健康标识符(PHI)和准标识符(QIS)的不同脱田水平来量化DRN的隐私风险。方法在威胁隐私的观察医疗成果伙伴关系(OMOP)CDM中检测到PHIS和QIS,根据1996年的18岁(HIPPA)标识符和以前的研究。要根据不同的隐私政策比较隐私风险,我们基于16个PHI和12个QIS生成有限和安全的港口数据集,作为从合成公共使用文件5%(Synpuf5PCT)数据集的威胁隐私,这是一个公共数据集Omop CDM。具有最小单元格大小和等效类方法,我们通过比较两个数据集来获得的隐私风险降低。我们还测量了来自两个数据集的随机采样记录中的间隙来调整PHI或QI记录的数量。结果PHIS和QIS的间隙平均为31.448%和73.798%,其中一个最小的单元格大小,它代表了数据集中的唯一记录。在PHI中,国家提供者标识符分别具有71.236%的最高差距(分别为71.244%和0.007%,分别在有限和安全的港口数据集中)。等效类的最大大小,它具有禁止区分的记录集的最大大小,平均为771.在PHI的1000个随机样本中,DEVIET_EXPOLSE_START_DATE的最高差距为33.730%(数据集中的87.705%和53.975%)。在QIS中,死亡的最高差距为99.212%(数据集中99.997%和0.784%)。在1000,10,000和100,000个QIS的随机样本中,DEVIET_TREATMENT的最高间隙为12.980%(数据集中99.980%和87.000%),60.118%(99.831%和39.713%),93.597%(98.805%和5.207 %)分别和100万次随机样品,死亡的差距为99.063%(数据集中99.998%和0.934%)。结论在本研究中,我们核实并量化了PHIS和QIS的隐私风险。虽然本研究使用了有限的PHIS和QIS进行验证,但本研究中发现的隐私限制可作为多制度协作研究的脱田的质量测量指标,从而提高DRN安全。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号