首页> 外文会议>Linguistic Annotation Workshop >Provenance for Linguistic Corpora Through Nanopublications
【24h】

Provenance for Linguistic Corpora Through Nanopublications

机译:通过纳米泛利用语言语料库来源

获取原文

摘要

Research in Computational Linguistics is dependent on text corpora for training and testing new tools and methodologies. While there exists a plethora of annotated linguistic information, these corpora are often not interoperable without significant manual work. Moreover, these annotations might have evolved into different versions, making it challenging for researchers to know the data's provenance. This paper addresses this issue with a case study on event annotated corpora and by creating a new, more interoperable representation of this data in the form of nanopublications. We demonstrate how linguistic annotations from separate corpora can be reliably linked from the start, and thereby be accessed and queried as if they were a single dataset. We describe how such nanopublications can be created and demonstrate how SPARQL queries can be performed to extract interesting content from the new representations. The queries show that information of multiple corpora can be retrieved more easily and effectively because the information of different corpora is represented in a uniform data format.
机译:计算语言学的研究取决于培训和测试新工具和方法的文本语料库。虽然存在过多的注释语言信息,但在没有重大手动工作的情况下,这些公司往往无法互操作。此外,这些注释可能已经进化为不同的版本,使研究人员能够认真了解数据的出处。本文讨论了此问题的案例研究了关于纳米百货商的形式创建了新的,更可互操作的此数据的新的,更可互操作。我们展示了单独的语料中的语言注释如何可靠地从一开始就可靠地链接,从而访问和查询,好像它们是单个数据集一样。我们描述了如何创建此类纳米百文,并演示如何执行SPARQL查询以从新表示中提取有趣的内容。查询表明,可以更容易且有效地检索多个语料库的信息,因为不同的语料库的信息以统一的数据格式表示。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号