An Empirical Study of Automatic Chinese Word Segmentation for Spoken Language Understanding and Named Entity Recognition

机译：汉语自动分词对口语理解和命名实体识别的实证研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Word segmentation is usually recognized as the first step for many Chinese natural language processing tasks, yet its impact on these subsequent tasks is relatively under-studied. For example, how to solve the mismatch problem when applying an existing word seg-menter to new data? Does a better word seg-menter yield a better subsequent NLP task performance? In this work, we conduct an initial attempt to answer these questions on two related subsequent tasks: semantic slot filling in spoken language understanding and named entity recognition. We propose three techniques to solve the mismatch problem: using word segmentation outputs as additional features, adaptation with partial-learning and taking advantage of n-best word segmentation list. Experimental results demonstrate the effectiveness of these techniques for both tasks and we achieve an error reduction of about 11% for spoken language understanding and 24% for named entity recognition over the baseline systems.

机译：分词通常被认为是许多中文自然语言处理任务的第一步，但是对这些后续任务的影响却相对未被充分研究。例如，将现有的词段分割器应用于新数据时，如何解决不匹配问题？更好的词段指导器会带来更好的后续NLP任务性能吗？在这项工作中，我们进行了一个初步的尝试，以回答两个相关的后续任务：在口头理解中的语义空位填充和命名实体识别。我们提出了三种解决不匹配问题的技术：使用分词输出作为附加功能，通过部分学习进行自适应以及利用n个最佳分词列表的优势。实验结果证明了这些技术对于两种任务的有效性，并且在基准系统上，我们的口语理解能力降低了11％，命名实体识别能力降低了24％。

著录项

来源
《Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies》|2016年|238-248|共11页
会议地点
作者
Wencan Luo; Fan Yang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Chinese word segmentation and named entity recognition: A pragmatic approach [J] . Gao JF, Li M, Wu A, Computational linguistics . 2005,第4期

机译：中文分词与命名实体识别：一种务实的方法
2. Chinese word segmentation and named entity recognition: A pragmatic approach [J] . Gao JF, Li M, Wu A, Computational linguistics . 2005,第4期

机译：中文分词与命名实体识别：一种务实的方法
3. Universal attribute characterization of spoken languages for automatic spoken language recognition [J] . Sabato Marco Siniscalchi, Jeremy Reed, Torbjorn Svendsen, Computer speech and language . 2013,第1期

机译：口语的通用属性表征，用于自动口语识别
4. An Empirical Study of Automatic Chinese Word Segmentation for Spoken Language Understanding and Named Entity Recognition [C] . Wencan Luo, Fan Yang Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 2016

机译：对语言理解的自动汉字分割的实证研究和命名实体识别
5. An Application of Natural Language Processing: Named Entity Recognition with BLSTM in Chinese Corpora [D] . Mao, Lihui 2019

机译：自然语言处理的应用：BLSTM在中文语料库中的命名实体识别
6. Joint segmentation and named entity recognition using dual decomposition in Chinese discharge summaries [O] . Yan Xu, Yining Wang, Tianren Liu, 2014

机译：中文放电摘要中使用双重分解的联合分割和命名实体识别
7. An Empirical Study of Automatic Chinese Word Segmentation for Spoken Language Understanding and Named Entity Recognition [O] . Wencan Luo, Fan Yang 2016

机译：对语言理解的自动汉字分割的实证研究和命名实体识别

An Empirical Study of Automatic Chinese Word Segmentation for Spoken Language Understanding and Named Entity Recognition

摘要

著录项

相似文献

相关主题

期刊订阅