首页> 中文期刊> 《计算机学报》 >Goldfish:基于矩阵分解的大规模RDF数据存储与查询系统

Goldfish:基于矩阵分解的大规模RDF数据存储与查询系统

         

摘要

With the rapid development of the Internet applications and the semantic web technology,the amount of the semantic data is exploding.On one hand,it is significant to store and query semantic data efficiently,as many applications can provide better services based on this.On the other hand,the rapid increase of the semantic data brings new challenges on efficient storing and querying semantic data in big data era.The traditional ways for semantic data management is to store and query the data in relational database management systems.As the data increases,the traditional ways can hardly handle big data.To address this problem,this paper proposed a distributed hierarchical storage architecture to store and query large-scale semantic data based on the OpenRDF Sesame framework.The RDF storage mechanism is optimized by adopting the attribute table to replace the RDF triple store.Considering the big semantic data,a parallel frequent item set mining algorithm with Spark framework is proposed to generate the index of the attribute table.Moreover,a layer of optimized hash conversion is proposed to avoid wasting time in frequent hash table search during query stage.To evaluate the efficiency of the proposed approach in this paper,we implement a prototype system called Goldfish,and conduct a comparison use large-scale synthetic dataset and real dataset.Experiment results show that Goldfish is around 8 times faster than Rainbow,500 times faster than Jena-HBase and 1200 times faster than the MapReduce based RDF querying system SHARD.%随着互联网应用的迅猛发展和语义网技术研究的深入,语义数据呈现出爆炸性增长趋势.一方面,对于语义数据实现高效存储和查询是语义网应用的重要基础,越来越多的语义应用可以依赖于此以提供更好的服务;另一方面,语义数据的爆炸性增长,对大数据环境下的语义数据的存储与查询技术提出了新的挑战.传统的基于关系型数据库的语义数据与查询系统已难以满足大规模语义数据的存储与查询需求.该文针对大规模RDF数据的存储与查询问题,以OpenRDF Sesame框架为基础,采用分布式分层式存储架构,提出并实现了属性表存储结构来进行语义数据的存储.在此基础上,针对布尔矩阵分解算法在对大规模语义数据构造属性表较慢的问题,基于Spark分布式计算框架提出并实现了并行化频繁项集挖掘算法求解大规模矩阵分解,以加速属性表的构造过程.并且,在查询层增加了基于哈希转换等查询优化.最后,基于该文所提出的索引结构和优化方法设计实现了原型系统Goldfish,并在大规模合成和真实数据集上进行了实验对比.结果表明,Goldfish原型系统比Rainbow系统查询性能平均提升约6倍,比Jena-HBase查询性能平均提升约500倍,比基于MapReduce的RDF查询系统SHARD性能平均提升约1200倍.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号