首页> 外文会议>International Conference on Computational Linguistics >Joint Chinese Word Segmentation and Part-of-speech Tagging via Multi-channel Attention of Character N-grams
【24h】

Joint Chinese Word Segmentation and Part-of-speech Tagging via Multi-channel Attention of Character N-grams

机译:通过字符N-GRAM的多通道关注联合中文字分割和语音标记

获取原文

摘要

Chinese word segmentation (CWS) and part-of-speech (POS) tagging are two fundamental tasks for Chinese language processing. Previous studies have demonstrated that jointly performing them can be an effective one-step solution to both tasks and this joint task can benefit from a good modeling of contextual features such as n-grams. However, their work on modeling such contextual features is limited to concatenating the features or their embeddings directly with the input embeddings without distinguishing whether the contextual features are important for the joint task in the specific context. Therefore, their models for the joint task could be misled by unimportant contextual information. In this paper, we propose a character-based neural model for the joint task enhanced by multi-channel attention of n-grams. In the attention module, n-gram features are categorized into different groups according to several criteria, and n-grams in each group are weighted and distinguished according to their importance for the joint task in the specific context. To categorize n-grams, we try two criteria in this study, i.e., n-gram frequency and length, so that n-grams having different capabilities of carrying contextual information are discriminatively learned by our proposed attention module. Experimental results on five benchmark datasets for CWS and POS tagging demonstrate that our approach outperforms strong baseline models and achieves state-of-thc-art performance on all five datasets.
机译:中文分段(CWS)和演讲(POS)标记是中文处理的两个基本任务。以前的研究已经证明,联合执行它们可以是两个任务的有效的一步解决方案,并且该联合任务可以受益于诸如N-GRAM等上下文特征的良好建模。然而,他们对这种上下文特征建模的工作仅限于直接与输入嵌入的功能或其嵌入式连接,而不区分特定上下文中的联合任务是重要的。因此,他们的联合任务的模型可能会被不重要的上下文信息误导。在本文中,我们提出了一种基于角色的神经模型,用于通过N-GRAM的多通道注意力增强的联合任务。在注意模块中,根据若干标准将n-gram特征分类为不同的组,并且每组中的n克被加权并根据其在特定上下文中的联合任务的重要性。要对n-grams进行分类,我们在本研究中尝试两个标准,即n-gram频率和长度,使得我们提出的注意模块的携带上下文信息具有不同能力的n-gram。关于CWS和POS标记的五个基准数据集的实验结果表明,我们的方法优于强大的基线模型,并在所有五个数据集中实现了最终的艺术表现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号