随着大数据时代的到来,数据量呈现爆炸式增长,HBase作为一种非关系型数据库为企业级用户提供了具有高可扩展性的系统平台,然而HBase采用类B+树索引设计,不支持非主键索引,在应对基于非主键索引请求时,查询效率较低,难以应用于实时性较高的业务需求.文章设计并实现了一种非主健索引的HBase分布式集群优化方案,使用基于Twemproxy的Redis集群作为缓存设计方案,提出基于热度积累的缓存替换算法,降低HBase扫描的资源访问开销,提高索引性能.实验结果表明:改进后的集群设计方案较传统分布式HBase数据库在非主键查询效率上有较明显性能提升,在缓存命中率上提升约20%,并且保持良好的可扩展性.%With the arrival of the era of big data,the amount of data has seen explosive growth.As a type of NoSQL database,HBase provides a system platform with high scalability,however there are some efficiency problems with HBase beacause of its nonsupport of non-primary key index,so it is difficult to apply HBase to the business requiring for real time.This paper designed an optimal solution based on non-primary key query of HBase distributed cluster viewing Redis as a cache database and proposing a cache replacement algorithm based on frequency to reduce the cost of access to HBase and improve indexing performance.The results show that the cluster after ameliorating improved more on non-primary key query efficiency compared with the native HBase distributed cluster,and keeps good scalability.
展开▼