首页> 外文会议>SIAM International Conference on Data Mining >Efficient Algorithms for Masking and Finding Quasi-Identifiers
【24h】

Efficient Algorithms for Masking and Finding Quasi-Identifiers

机译:用于屏蔽和查找准标识符的高效算法

获取原文

摘要

A quasi-identifier refers to a subset of attributes that can uniquely identify most tuples in a table. Incautious publication of quasi-identifiers will lead to privacy leakage. In this paper we consider the problems of finding and masking quasi-identifiers. Both problems are provably hard with severe time and space requirements. We focus on designing efficient approximation algorithms for large data sets. We first propose two natural measures for quantifying quasi-identifiers: distinct ratio and separation ratio. We develop efficient algorithms that find small quasi-identifiers with provable size and separation/distinct ratio guarantees, with space and time requirements sublinear in the number of tuples. We also propose efficient algorithms for masking quasi-identifiers, where we use a random sampling technique to greatly reduce the space and time requirements, without much sacrifice in the quality of the results. Our algorithms for masking and finding quasi-identifiers naturally apply to stream databases. Extensive experimental results on real world data sets confirm efficiency and accuracy of our algorithms.
机译:准标识符是指可以唯一地标识表中大多数元组的属性子集。对准标识符的不规则之刊将导致隐私泄漏。在本文中,我们考虑了查找和掩蔽准标识符的问题。这两个问题都是难以严重的时间和空间要求。我们专注于为大型数据集设计有效的近似算法。我们首先提出了两种用于量化准标识符的自然措施:不同的比例和分离率。我们开发高效的算法,该算法找到具有可提供的尺寸和分离/不同的比率保证的小准标识符,空间和时间要求在元组的数量中汇总。我们还提出了用于掩蔽准标识符的高效算法,在那里我们使用随机采样技术大大减少空间和时间要求,而不是在结果的质量上牺牲。我们用于屏蔽和查找准标识符的算法自然适用于流数据库。对现实世界数据的广泛实验结果确定了我们算法的效率和准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号