首页> 中文期刊> 《微型电脑应用》 >基于布隆过滤器的海量数据查询技术的优化与应用

基于布隆过滤器的海量数据查询技术的优化与应用

         

摘要

The theory and application scenarios of Bloom filter is illustrated by an analysis sample of customer behavior data.During the project Bloom filter can be used to search for large dataset effectively at a rapid rate.At the beginning of this paper,in-memory database,like MongoDB,is used to solve that question,with a lookup time complexity of O(1) after default index (_id) is the only one permitted to save the premium accouts.The disadvantage is that the functionality needed is limited and the pressure brought by concurrent (one to multiple) query becomes bigger as the valume of data increses.Then the accounts can be read into momery througth appropriate data structure using distributed cache.The mode of data access is changed into one-to-one,resulting in the bigger usage of memory.With a small amount of data to be processed,the performace of HashSet is acceptable because of its convience and speed.As the volume of data increases,Heap memory may overflow.Then,a custom data structure is adopted for the Bloom filter.The basic theory and false positive rate are analyzed,the error data (False Positive Error),reduced by Bloom Filter,can be eliminated.Theory analysis and experiment show that the features of low space usage and high search efficiency for Bloom filter are appropriate to solve this problem.%通过一个用户行为数据分析的案例,说明了布隆过滤器的原理和应用场景.在案例中,需要使用MapReduce框架在海量数据中筛选出付费用户相关的数据,布隆过滤器算法提供了一种快速、有效的实现方法.简述了使用MongoDB内存数据库存储付费用户的解决方案,其搜索效率高,但随着数据量的增加,一对多并发查询给服务端带来的压力会越来越大;如果使用分布式缓存的方法,这时为一对一存取,带来的问题是占用内存增大,如果数据结构选择HashSet,存入量大时,则容易使堆内存溢出,故考虑使用自定义数据结构:布隆过滤器,对其原理和误判率进行了分析,并针对其可能产生的错误数据(“假阳性”)提出消除方案,经实验验证,布隆过滤器占用内存低、查找效率高,解决本类问题极为合适.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号