首页> 外文会议>International conference on world wide web >Test-driven Evaluation of Linked Data Quality
【24h】

Test-driven Evaluation of Linked Data Quality

机译:测试驱动的链接数据质量评估

获取原文

摘要

Linked Open Data (LOD) comprises an unprecedented volume of structured data on the Web. However, these datasets are of varying quality ranging from extensively curated datasets to crowdsourced or extracted data of often relatively low quality. We present a methodology for test-driven quality assessment of Linked Data, which is inspired by test-driven software development. We argue that vocabularies, ontologies and knowledge bases should be accompanied by a number of test cases, which help to ensure a basic level of quality. We present a methodology for assessing the quality of linked data resources, based on a formalization of bad smells and data quality problems. Our formalization employs SPARQL query templates, which are instantiated into concrete quality test case queries. Based on an extensive survey, we compile a comprehensive library of data quality test case patterns. We perform automatic test case instantiation based on schema constraints or semi-automatically enriched schemata and allow the user to generate specific test case instantiations that are applicable to a schema or dataset. We provide an extensive evaluation of five LOD datasets, manual test case instantiation for five schemas and automatic test case instantiations for all available schemata registered with Linked Open Vocabularies (LOV). One of the main advantages of our approach is that domain specific semantics can be encoded in the data quality test cases, thus being able to discover data quality problems beyond conventional quality heuristics.
机译:链接开放数据(LOD)包含Web上前所未有的结构化数据量。但是,这些数据集的质量各不相同,从广泛策划的数据集到通常相对较低质量的众包或提取数据。我们介绍了链接数据的测试驱动质量评估的方法,该方法受测试驱动软件开发的启发。我们认为,词汇,本体和知识库应辅以大量测试用例,以确保基本质量。我们基于臭味和数据质量问题的形式化,提出了一种评估链接数据资源质量的方法。我们的形式化使用SPARQL查询模板,这些模板被实例化为具体的质量测试用例查询。在广泛调查的基础上,我们编译了一个全面的数据质量测试用例模式库。我们基于模式约束或半自动丰富的模式执行自动测试用例实例化,并允许用户生成适用于模式或数据集的特定测试用例实例化。我们提供了五个LOD数据集的广泛评估,五个模式的手动测试用例实例化以及链接开放式词汇表(LOV)注册的所有可用模式的自动测试用例实例化。我们方法的主要优点之一是可以在数据质量测试用例中对特定于领域的语义进行编码,从而能够发现常规质量启发式方法以外的数据质量问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号