首页> 外文会议>IEEE International Conference on Computer and Communications >Memory-Compact Membership Lookup for Multiple Data Sets by a Single Bloom Filter
【24h】

Memory-Compact Membership Lookup for Multiple Data Sets by a Single Bloom Filter

机译:通过单个Bloom筛选器对多个数据集进行内存紧凑的成员资格查找

获取原文

摘要

Bloom filter is a memory-compact data structure to encode a set of data items, which can address the set membership query with no false negative and a configurable false positive rate. It is a fundamental tool with a wide range of applications in multiple disciplines, such as data science, networking, computer architecture, and distributed computing. However, Bloom filter faces a challenge of memory allocation: How much memory should be given to its data structure when its encoded data set is dynamically formed and has no prior-known set size. As a result, when more set elements continuously arrives, its data structure will become more crowded, causing its false positive rate of addressing membership query to increase. This problem becomes even more challenging, when there are multiple data sets to represent and each data set is independently formed in a streaming fashion. The traditional way to support the set membership checking for multiple data sets is to allocate each data set a separate Bloom filter. Instead, this paper takes a dramatically different approach: We encode all data sets in a single large filter and yet supports membership lookup for all of them, with a false positive rate bound that is independently configurable for each set. We analyze the properties of the filter and, in particular, the formulas for its feasible region where the false positive rate requirements are met for all data sets.
机译:布隆过滤器是一种存储紧凑的数据结构,用于对一组数据项进行编码,该数据项可以以无误报和可配置的误报率解决该组成员资格查询。它是一种基本工具,在数据科学,网络,计算机体系结构和分布式计算等多个学科中具有广泛的应用。但是,Bloom过滤器面临内存分配的挑战:当动态地形成其编码数据集并且没有已知的集合大小时,应为其数据结构分配多少内存。结果,当更多集合元素连续到达时,其数据结构将变得更加拥挤,从而导致其寻址成员查询的误报率增加。当要表示多个数据集并且每个数据集以流方式独立形成时,此问题将变得更加具有挑战性。支持多个数据集的集合成员资格检查的传统方法是为每个数据集分配一个单独的Bloom筛选器。取而代之的是,本文采用了截然不同的方法:我们在单个大型过滤器中对所有数据集进行编码,但仍支持所有数据集的成员资格查找,每个数据集都可以独立配置误报率范围。我们分析了过滤器的属性,尤其是分析了其中所有数据集都满足误报率要求的可行区域的公式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号