首页> 外文期刊>Applied Sciences >An ERNIE-Based Joint Model for Chinese Named Entity Recognition
【24h】

An ERNIE-Based Joint Model for Chinese Named Entity Recognition

机译:基于ERNIE的中文名称实体识别联合模型

获取原文
           

摘要

Named Entity Recognition (NER) is the fundamental task for Natural Language Processing (NLP) and the initial step in building a Knowledge Graph (KG). Recently, BERT (Bidirectional Encoder Representations from Transformers), which is a pre-training model, has achieved state-of-the-art (SOTA) results in various NLP tasks, including the NER. However, Chinese NER is still a more challenging task for BERT because there are no physical separations between Chinese words, and BERT can only obtain the representations of Chinese characters. Nevertheless, the Chinese NER cannot be well handled with character-level representations, because the meaning of a Chinese word is quite different from that of the characters, which make up the word. ERNIE (Enhanced Representation through kNowledge IntEgration), which is an improved pre-training model of BERT, is more suitable for Chinese NER because it is designed to learn language representations enhanced by the knowledge masking strategy. However, the potential of ERNIE has not been fully explored. ERNIE only utilizes the token-level features and ignores the sentence-level feature when performing the NER task. In this paper, we propose the ERNIE-Joint, which is a joint model based on ERNIE. The ERNIE-Joint can utilize both the sentence-level and token-level features by joint training the NER and text classification tasks. In order to use the raw NER datasets for joint training and avoid additional annotations, we perform the text classification task according to the number of entities in the sentences. The experiments are conducted on two datasets: MSRA-NER and Weibo. These datasets contain Chinese news data and Chinese social media data, respectively. The results demonstrate that the ERNIE-Joint not only outperforms BERT and ERNIE but also achieves the SOTA results on both datasets.
机译:命名实体识别(ner)是自然语言处理(NLP)的基本任务以及构建知识图(kg)的初始步骤。最近,作为一种预训练模型的BERT(来自变压器的双向编码器表示)已经实现了最先进的(SOTA)导致包括NER的各种NLP任务。然而,中国人仍然是伯特的一个更具挑战性的任务,因为中文单词之间没有物理分离,伯特只能获得汉字的表示。尽管如此,中国人不能很好地处理字符级别表示,因为中文单词的含义与构成这个词的字符的意义与字符相比。 ernie(通过知识集成增强的代表),这是一种改进的伯特预训练模型,更适合中国人,因为它旨在学习通过知识掩蔽战略增强的语言表示。但是,厄尼的潜力尚未完全探索。 Ernie仅利用令牌级别功能,并在执行NER任务时忽略句子级功能。在本文中,我们提出了Ernie-Connie,这是一个基于厄尼的联合模型。 Ernie-jock可以通过联合培训NER和文本分类任务来利用句子级和令牌级功能。为了使用RAW NER数据集进行联合培训并避免额外的注释,我们根据句子中的实体数执行文本分类任务。实验是在两个数据集:MSRA-NER和WEIBO上进行的。这些数据集分别包含中国新闻数据和中国社交媒体数据。结果表明,Ernie-关节不仅优于BERT和ERNIE,而且还达到了两个数据集的SOTA结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号