首页> 外文期刊>Language Resources and Evaluation >SenseDefs: a multilingual corpus of semantically annotated textual definitions Exploiting multiple languages and resources jointly for high-quality Word Sense Disambiguation and Entity Linking
【24h】

SenseDefs: a multilingual corpus of semantically annotated textual definitions Exploiting multiple languages and resources jointly for high-quality Word Sense Disambiguation and Entity Linking

机译:SenseDefs:带有语义注释的文本定义的多语言语料库共同开发多种语言和资源,以实现高质量的词义消歧和实体链接

获取原文
获取原文并翻译 | 示例
           

摘要

Definitional knowledge has proved to be essential in various Natural Language Processing tasks and applications, especially when information at the level of word senses is exploited. However, the few sense-annotated corpora of textual definitions available to date are of limited size: this is mainly due to the expensive and time-consuming process of annotating a wide variety of word senses and entity mentions at a reasonably high scale. In this paper we present SenseDefs, a large-scale high-quality corpus of disambiguated definitions (or glosses) in multiple languages, comprising sense annotations of both concepts and named entities from a wide-coverage unified sense inventory. Our approach for the construction and disambiguation of this corpus builds upon the structure of a large multilingual semantic network and a state-of-the-art disambiguation system: first, we gather complementary information of equivalent definitions across different languages to provide context for disambiguation; then we refine the disambiguation output with a distributional approach based on semantic similarity. As a result, we obtain a multilingual corpus of textual definitions featuring over 38 million definitions in 263 languages, and we publicly release it to the research community. We assess the quality of SenseDefs's sense annotations both intrinsically and extrinsically on Open Information Extraction and Sense Clustering tasks.
机译:事实证明,定义知识在各种自然语言处理任务和应用中至关重要,尤其是在利用词义级别的信息时。但是,迄今为止,很少有带有注释的文本定义语料库具有有限的大小:这主要是由于以合理的高规模注释各种词义和实体提及的过程昂贵且耗时。在本文中,我们介绍了SenseDefs,这是一种高质量的多语种歧义定义(或称谓)的语料库,包括来自广泛覆盖的统一感知清单的概念和命名实体的感知注释。我们用于构建和消除歧义的方法的基础是大型多语言语义网络的结构和最新的歧义消除系统:首先,我们收集不同语言等效定义的补充信息,以提供歧义上下文。然后我们使用基于语义相似性的分布方法改进消歧输出。结果,我们获得了包含263种语言的3,800万种定义的多语言文本定义语料库,并将其公开发布给研究社区。我们在开放信息提取和感知聚类任务中,从内在和外在地评估SenseDefs感知注释的质量。

著录项

相似文献

  • 外文文献
  • 中文文献
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号