首页> 外文期刊>International Journal of Computer Processing of Oriental Languages >Chinese Unknown Word Identification Based on Local Bigram Model
【24h】

Chinese Unknown Word Identification Based on Local Bigram Model

机译:基于局部Bigram模型的中文未知词识别

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a Chinese unknown word identification system based on a local bigram model. Generally, our word segmentation system employs a statistical-based unigram model. But to identify those unknown words, we take advantage of their contextual information and apply a bigram model locally. By adjusting the value of interpolation which is derived from a smoothing method, we combine these two models with different dimensions. As a simplification of bigram, this method is simple as well as feasible, since the complexity of its algorithm is quite low and not so many training corpora are needed. The results of our experiments show the solution is effective.
机译:本文提出了一种基于局部二元模型的中文未知词识别系统。通常,我们的分词系统采用基于统计的单字模型。但是要识别这些未知单词,我们利用它们的上下文信息并在本地应用bigram模型。通过调整从平滑方法得出的插值值,我们将这两个具有不同尺寸的模型组合在一起。作为bigram的简化,该方法既简单又可行,因为其算法的复杂性非常低,并且不需要太多的训练语料库。我们的实验结果表明该解决方案是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号