基于布隆过滤器的海量数据查询技术的优化与应用

饶文; 陈旭

首页> 中文期刊> 《微型电脑应用》 >基于布隆过滤器的海量数据查询技术的优化与应用

基于布隆过滤器的海量数据查询技术的优化与应用

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The theory and application scenarios of Bloom filter is illustrated by an analysis sample of customer behavior data.During the project Bloom filter can be used to search for large dataset effectively at a rapid rate.At the beginning of this paper,in-memory database,like MongoDB,is used to solve that question,with a lookup time complexity of O(1) after default index (_id) is the only one permitted to save the premium accouts.The disadvantage is that the functionality needed is limited and the pressure brought by concurrent (one to multiple) query becomes bigger as the valume of data increses.Then the accounts can be read into momery througth appropriate data structure using distributed cache.The mode of data access is changed into one-to-one,resulting in the bigger usage of memory.With a small amount of data to be processed,the performace of HashSet is acceptable because of its convience and speed.As the volume of data increases,Heap memory may overflow.Then,a custom data structure is adopted for the Bloom filter.The basic theory and false positive rate are analyzed,the error data (False Positive Error),reduced by Bloom Filter,can be eliminated.Theory analysis and experiment show that the features of low space usage and high search efficiency for Bloom filter are appropriate to solve this problem.%通过一个用户行为数据分析的案例,说明了布隆过滤器的原理和应用场景.在案例中,需要使用MapReduce框架在海量数据中筛选出付费用户相关的数据,布隆过滤器算法提供了一种快速、有效的实现方法.简述了使用MongoDB内存数据库存储付费用户的解决方案,其搜索效率高,但随着数据量的增加,一对多并发查询给服务端带来的压力会越来越大;如果使用分布式缓存的方法,这时为一对一存取,带来的问题是占用内存增大,如果数据结构选择HashSet,存入量大时,则容易使堆内存溢出,故考虑使用自定义数据结构:布隆过滤器,对其原理和误判率进行了分析,并针对其可能产生的错误数据(“假阳性”)提出消除方案,经实验验证,布隆过滤器占用内存低、查找效率高,解决本类问题极为合适.

著录项

来源
《微型电脑应用》 |2018年第2期|68-71后插1|共5页
作者
饶文; 陈旭;
展开▼
作者单位

南京烽火软件科技有限公司;

南京210000;

南京烽火星空通信发展有限公司;

南京210000;

展开▼
原文格式 PDF
正文语种 chi
中图分类程序设计、软件工程;
关键词
MapReduce; 布隆过滤器; 数据集; MongoDB;

相似文献

中文文献
外文文献
专利

1. 基于海量数据的快速查询技术研究 [J] . 赵林斌 ,邵战强 ,魏威 . 智能城市 . 2019,第004期
2. 基于海量数据的快速查询技术研究 [J] . 赵林斌1 ,邵战强1 ,魏威1 . 智能城市 . 2019,第004期
3. 基于海量数据优化管理的分布式文件存储系统应用研究 [J] . 盛文婷 . 商业2.0（经济管理） . 2021,第012期
4. 基于海量数据优化管理的分布式文件存储系统应用研究 [J] . 高尚建 ,魏国 ,杨功 . 科技创新与应用 . 2020,第018期
5. 基于互动平台海量数据处理问题的优化及应用 [J] . 秦栋 . 中国有线电视 . 2018,第011期
6. 海量数据管理平台MDMP中并行加载与查询技术研究 [C] . 张丽 ,杨树强 ,李爱平 . 第二十四届中国数据库学术会议 . 2007
7. 海量数据中基于关键字的Top-k查询技术研究 [A] . 寇爱军 . 2013

基于布隆过滤器的海量数据查询技术的优化与应用

摘要

著录项

相似文献

相关主题

期刊订阅