【24h】

Chinese Word Segmentation as POC-NLW Tagging

机译:中文分词作为POC-NLW标记

获取原文
获取原文并翻译 | 示例

摘要

In Chinese word segmentation, disambiguation and unknown words identification are the two key issues still remaining. In order to deal with these problems in a uniform way, a language tagging template, named POC-NLW, is presented in this paper to explore the word creation mechanisms of Chinese language on character-level. Based on this template, a Hidden Markov Model based tagger is constructed to implement word segmentation as character tagging. In this method, the basic word segmentation, disambiguation, and the unknown words identification are integrated fundamentally and accomplished in one unified process. Experimental results on the SIGHAN Bakeoff 2005 corpus show that the method can achieve high accuracy on word segmentation, especially on unknown words identification, with appreciable processing efficiency. This method is characterized by the good interoperability and expansionary over different kinds of words, thus it is applicable for practical Chinese information processing applications.
机译:在中文分词中,消歧和未知词识别是仍然存在的两个关键问题。为了统一解决这些问题,本文提出了一种语言标签模板,称为POC-NLW,以探讨汉字在单词层面上的造词机制。基于此模板,构造了一个基于隐马尔可夫模型的标记器,以实现单词分割作为字符标记。该方法将基本分词,消歧和未知词识别从根本上整合在一起,并在一个统一的过程中完成。在SIGHAN Bakeoff 2005语料库上的实验结果表明,该方法在分词,特别是在未知词识别方面可以达到很高的准确度,并且处理效率很高。该方法具有良好的互操作性,可以在不同种类的单词上扩展,因此适用于实际的中文信息处理应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号