首页> 外文会议>9th International conference on language resources and evaluation >Exploring the utility of coreference chains for improved identification of personal names
【24h】

Exploring the utility of coreference chains for improved identification of personal names

机译:探索Coreference链条的效用,以改善个人名称的识别

获取原文

摘要

Identifying the real world entity that a proper name refers to is an important task in many NLP applications. Context plays an important role in disambiguating entities with the same names. In this paper, we discuss a dataset and experimental set-up that allows us to systematically explore the effects of different sizes and types of context in this disambiguation task. We create context by first identifying coreferent expressions in the document and then combining sentences these expressions occur in to one informative context. We apply different filters to obtain different levels of coreference-based context. Since hand-labeling a dataset of a decent size is expensive, we investigate the usefulness of an automatically created pseudo-ambiguity dataset. The results on this pseudo-ambiguity dataset show that using coreference-based context performs better than using a fixed window of context around the entity. The insights taken from the pseudo data experiments can be used to predict how the method works with real data. In our experiments on real data we obtain comparable results.
机译:识别正确名称是指在许多NLP应用程序中的重要任务。背景信息在歧义具有相同名称的歧义实体中扮演重要作用。在本文中,我们讨论了数据集和实验设置,使我们能够系统地探索不同尺寸和语境类型在这种歧义任务中的影响。我们通过首先在文档中识别Coreferent表达式来创建上下文,然后将这些表达式组合到一个信息性上下文中的句子。我们应用不同的过滤器以获取不同级别的基于Coreference的上下文。由于手工标记了体面大小的数据集是昂贵的,因此我们调查自动创建的伪模糊数据集的有用性。该伪模糊的数据集上的结果显示,使用基于Coreference的上下文比使用实体周围的固定窗口更好地执行。从伪数据实验中采取的见解可用于预测该方法如何使用真实数据。在我们对实际数据的实验中,我们获得了可比的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号