...
首页> 外文期刊>ACM journal of data and information quality >Content-based Union and Complement Metrics for Dataset Search over RDF Knowledge Graphs
【24h】

Content-based Union and Complement Metrics for Dataset Search over RDF Knowledge Graphs

机译:基于内容的联合和补充指标,用于数据集搜索RDF知识图表

获取原文
获取原文并翻译 | 示例
           

摘要

RDF Knowledge Graphs (or Datasets) contain valuable information that can be exploited for a variety of real-world tasks. However, due to the enormous size of the available RDF datasets, it is difficult to discover the most valuable datasets for a given task. For improving dataset Discoverability, Interlinking, and Reusability, there is a trend for Dataset Search systems. Such systems are mainly based on metadata and ignore the contents; however, in tasks related to data integration and enrichment, the contents of datasets have to be considered. This is important for data integration but also for data enrichment, for instance, quite often datasets' owners want to enrich the content of their dataset, by selecting datasets that provide complementary information for their dataset. The above tasks require content-based union and complement metrics between any subset of datasets; however, there is a lack of such approaches. For making feasible the computation of such metrics at very large scale, we propose an approach relying on (a) a set of pre-constructed (and periodically refreshed) semantics-aware indexes, and (b) "lattice-based" incremental algorithms that exploit the posting lists of such indexes, as well as set theory properties, for enabling efficient responses at query time. Finally, we discuss the efficiency of the proposed methods by presenting comparative results, and we report measurements for 400 real RDF datasets (containing over 2 billion triples), by exploiting the proposed metrics.
机译:RDF知识图(或数据集)包含可用于各种真实任务的有价值的信息。但是,由于可用RDF数据集的巨大大小,很难发现给定任务的最有价值的数据集。为了提高DataSet可发现性,互连和可重用性,存在数据集搜索系统的趋势。这种系统主要基于元数据并忽略内容;然而,在与数据集成和丰富相关的任务中,必须考虑数据集的内容。这对于数据集成很重要,但对于数据丰富,例如,通常通常通过选择为其数据集提供互补信息的数据集来丰富其数据集的内容。上述任务需要基于内容的联盟和任何数据集子集之间的补充指标;但是,缺乏这样的方法。为了在非常大的规模中进行可行计算此类度量的计算,我们提出了一种依赖于(a)一组预构造的(和周期性刷新的)语义感知索引的方法,(b)“基于格子的”增量算法利用此类索引的发布列表,以及设置理论属性,以便在查询时间启用有效的响应。最后,我们通过提出比较结果来讨论所提出的方法的效率,并通过利用拟议的指标,报告400个真实RDF数据集(包含超过20亿三分之一)的测量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号