首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Distant Learning for Entity Linking with Automatic Noise Detection
【24h】

Distant Learning for Entity Linking with Automatic Noise Detection

机译:远程学习实体与自动噪声检测相关联

获取原文

摘要

Accurate entity linkers have been produced for domains and languages where annotated data (i.e., texts linked to a knowledge base) is available. However, little progress has been made for the settings where no or very limited amounts of labeled data are present (e.g., legal or most scientific domains). In this work, we show how we can learn to link mentions without having any labeled examples, only a knowledge base and a collection of unanno-tated texts from the corresponding domain. In order to achieve this, we frame the task as a multi-instance learning problem and rely on surface matching to create initial noisy labels. As the learning signal is weak and our surrogate labels are noisy, we introduce a noise detection component in our model: it lets the model detect and disregard examples which are likely to be noisy. Our method, jointly learning to detect noise and link entities, greatly outperforms the surface matching baseline. For a subset of entity categories, it even approaches the performance of supervised learning.
机译:已经为域和语言制作了准确的实体链接器,其中有注释数据(即,与知识库链接的文本)可用。但是,对于没有或非常有限的标记数据存在的设置,已经提出了一点进展(例如,法律或最科学域名)。在这项工作中,我们展示了我们如何学会在没有任何标记示例的情况下学习提出,只有知识库,只有来自相应域的未经编译文本的集合。为了实现这一目标,我们将任务框架作为多实例学习问题,依赖于曲面匹配以创建初始噪声标签。随着学习信号较弱,我们的代理标签嘈杂,我们在我们的模型中引入了噪声检测组件:它允许模型检测和忽视可能嘈杂的示例。我们的方法,共同学习检测噪声和链接实体,大大优于表面匹配基线。对于实体类别的子集,它甚至涉及监督学习的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号