【24h】

Is a Common Phrase an Entity Mention or Not? Dual Representations for Domain-Specific Named Entity Recognition

机译:通用短语是否为实体提及?特定于域的命名实体识别的双重表示

获取原文

摘要

Named Entity Recognition (NER) for specific domains is critical for building and managing domain-specific knowledge bases, but conventional NER methods cannot be applied to specific domains effectively. We found that one of reasons is the problem of common-phrase-like entity mention prevalent in many domains. That is. many common phrases frequently occurring in general corpora may or may not be treated as named entities in specific domains. Therefore, determining whether a common phrase is an entity mention or not is a challenge. To address this issue, we present a novel BLSTM based NER model tailored for specific domains by learning dual representations for each word. It learns not only general domain knowledge derived from an external large scale general corpus via a word embedding model, but also the specific domain knowledge by training a stacked deep neural network (SDNN) integrating the results of a low-cost pre-entity-linking process. Extensive experiments on a real-world dataset of movie comments demonstrate the superiority of our model over existing state-of-the-art methods.
机译:特定领域的命名实体识别(NER)对于建立和管理特定领域的知识库至关重要,但是常规NER方法无法有效地应用于特定领域。我们发现原因之一是在许多领域中普遍存在的类似常用短语的实体提及问题。那是。一般语料库中经常出现的许多常见短语可能会也可能不会被视为特定领域中的命名实体。因此,确定一个通用短语是否是一个实体提及是一个挑战。为了解决这个问题,我们通过学习每个单词的双重表示形式,为特定领域量身定制了一个新颖的基于BLSTM的NER模型。它不仅通过单词嵌入模型学习从外部大规模通用语料库中获得的一般领域知识,而且还通过训练集成了低成本前实体链接结果的堆叠式深度神经网络(SDNN)来学习特定领域知识。过程。在真实世界中的电影评论数据集上进行的大量实验证明,我们的模型优于现有的最新方法。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号