Goldfish:基于矩阵分解的大规模RDF数据存储与查询系统

顾荣; 仇红剑; 杨文家; 胡伟; 袁春风; 黄宜华

首页> 中文期刊> 《计算机学报》 >Goldfish:基于矩阵分解的大规模RDF数据存储与查询系统

Goldfish:基于矩阵分解的大规模RDF数据存储与查询系统

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the rapid development of the Internet applications and the semantic web technology,the amount of the semantic data is exploding.On one hand,it is significant to store and query semantic data efficiently,as many applications can provide better services based on this.On the other hand,the rapid increase of the semantic data brings new challenges on efficient storing and querying semantic data in big data era.The traditional ways for semantic data management is to store and query the data in relational database management systems.As the data increases,the traditional ways can hardly handle big data.To address this problem,this paper proposed a distributed hierarchical storage architecture to store and query large-scale semantic data based on the OpenRDF Sesame framework.The RDF storage mechanism is optimized by adopting the attribute table to replace the RDF triple store.Considering the big semantic data,a parallel frequent item set mining algorithm with Spark framework is proposed to generate the index of the attribute table.Moreover,a layer of optimized hash conversion is proposed to avoid wasting time in frequent hash table search during query stage.To evaluate the efficiency of the proposed approach in this paper,we implement a prototype system called Goldfish,and conduct a comparison use large-scale synthetic dataset and real dataset.Experiment results show that Goldfish is around 8 times faster than Rainbow,500 times faster than Jena-HBase and 1200 times faster than the MapReduce based RDF querying system SHARD.%随着互联网应用的迅猛发展和语义网技术研究的深入,语义数据呈现出爆炸性增长趋势.一方面,对于语义数据实现高效存储和查询是语义网应用的重要基础,越来越多的语义应用可以依赖于此以提供更好的服务;另一方面,语义数据的爆炸性增长,对大数据环境下的语义数据的存储与查询技术提出了新的挑战.传统的基于关系型数据库的语义数据与查询系统已难以满足大规模语义数据的存储与查询需求.该文针对大规模RDF数据的存储与查询问题,以OpenRDF Sesame框架为基础,采用分布式分层式存储架构,提出并实现了属性表存储结构来进行语义数据的存储.在此基础上,针对布尔矩阵分解算法在对大规模语义数据构造属性表较慢的问题,基于Spark分布式计算框架提出并实现了并行化频繁项集挖掘算法求解大规模矩阵分解,以加速属性表的构造过程.并且,在查询层增加了基于哈希转换等查询优化.最后,基于该文所提出的索引结构和优化方法设计实现了原型系统Goldfish,并在大规模合成和真实数据集上进行了实验对比.结果表明,Goldfish原型系统比Rainbow系统查询性能平均提升约6倍,比Jena-HBase查询性能平均提升约500倍,比基于MapReduce的RDF查询系统SHARD性能平均提升约1200倍.

著录项

来源
《计算机学报》 |2017年第10期|2212-2230|共19页
作者
顾荣; 仇红剑; 杨文家; 胡伟; 袁春风; 黄宜华;
展开▼
作者单位

南京大学计算机软件新技术国家重点实验室南京210093;

江苏省软件新技术与产业化协同创新中心南京 210093;

中国计算机学会(CCF);

中国计算机学会(CCF);

中国计算机学会(CCF);

展开▼
原文格式 PDF
正文语种 chi
中图分类程序设计、软件工程;
关键词
大规模RDF存储; 矩阵分解; 分层式存储; 大数据; 语义网; Spark;

相似文献

中文文献
外文文献
专利

1. 基于正交编码的大规模RDF数据存储系统设计 [J] . 金伟林 . 计算机仿真 . 2020,第011期
2. 基于Spark和Redis的大规模RDF数据查询系统 [J] . 阳杰 ,王木涵 ,徐九韵 . 计算机系统应用 . 2017,第009期
3. 基于Hadoop的RDF数据存储及查询优化 [J] . 徐德智 ,刘扬 ,Sarfraz Ahmed . 计算机应用研究 . 2017,第002期
4. 基于 NoSQL 的 RDF 数据存储与查询技术综述 [J] . 王林彬 ,黎建辉 ,沈志宏 . 计算机应用研究 . 2015,第005期
5. 基于SPARK的大规模RDF数据上的SPARQL查询算法 [J] . 崔家奇 ,闫威 . 计算机应用与软件 . 2020,第012期
6. 一种多索引的RDF数据存储与查询方案 [C] . Song Jinyu ,宋金玉 ,Wang Xing . 第29届中国数据库学术会议 . 2012
7. 大规模RDF数据存储与查询技术研究 [A] . 阳杰 . 2017

Goldfish:基于矩阵分解的大规模RDF数据存储与查询系统

摘要

著录项

相似文献

相关主题

期刊订阅