...
首页> 外文期刊>BMC Medical Informatics and Decision Making >A privacy-preserving distributed filtering framework for NLP artifacts
【24h】

A privacy-preserving distributed filtering framework for NLP artifacts

机译:用于NLP工件的保护隐私的分布式过滤框架

获取原文
           

摘要

Medical data sharing is a big challenge in biomedicine, which often hinders collaborative research. Due to privacy concerns, clinical notes cannot be directly shared. A lot of efforts have been dedicated to de-identifying clinical notes but it is still very challenging to accurately locate and scrub all sensitive elements from notes in an automatic manner. An alternative approach is to remove sentences that might contain sensitive terms related to personal information. A previous study introduced a frequency-based filtering approach that removes sentences containing low frequency bigrams to improve the privacy protection without significantly decreasing the utility. Our work extends this method to consider clinical notes from distributed sources with security and privacy considerations. We developed a novel secure protocol based on private set intersection and secure thresholding to identify uncommon and low-frequency terms, which can be used to guide sentence filtering. As the computational cost of our proposed framework mostly depends on the cardinality of the intersection of the sets and the number of data owners, we evaluated the framework in terms of these two factors. Experimental results demonstrate that our proposed method is scalable in various experimental settings. In addition, we evaluated our framework in terms of data utility. This evaluation shows that the proposed method is able to retain enough information for data analysis. This work demonstrates the feasibility of using homomorphic encryption to develop a secure and efficient multi-party protocol.
机译:医学数据共享是生物医学中的一大挑战,这通常会阻碍协作研究。由于隐私问题,不能直接共享临床笔记。为了消除临床笔记的识别性,已经进行了很多努力,但是要准确地自动定位笔记中的所有敏感元素并对其进行擦洗仍然是很大的挑战。另一种方法是删除可能包含与个人信息有关的敏感术语的句子。先前的研究引入了一种基于频率的过滤方法,该方法可删除包含低频双字母组的句子,从而在不显着降低实用性的情况下改善隐私保护。我们的工作将这种方法扩展为考虑来自分布式来源的临床注意事项,同时考虑到安全性和隐私权。我们开发了一种基于私有集交集和安全阈值的新颖安全协议,以识别不常见和低频的术语,可用于指导句子过滤。由于我们提出的框架的计算成本主要取决于集合和数据所有者数量的交集的基数,因此我们根据这两个因素对框架进行了评估。实验结果表明,我们提出的方法在各种实验设置下均可扩展。此外,我们根据数据实用性评估了我们的框架。该评估表明,所提出的方法能够保留足够的信息以进行数据分析。这项工作演示了使用同态加密来开发安全有效的多方协议的可行性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号