首页> 外文会议>CIPS-SIGHAN Joint Conference on Chinese Language Processing >ISCAS: A Cascaded Approach for CIPS-SIGHAN Micro-Blog Word Segmentation Bakeoff 2012 Track
【24h】

ISCAS: A Cascaded Approach for CIPS-SIGHAN Micro-Blog Word Segmentation Bakeoff 2012 Track

机译:ISCAS:用于CIPS-SIGHAN微博客分词市场推广2012的级联方法

获取原文

摘要

The state-of-the-art Chinese word segmentation systems have achieved high performance on well-formed long document. However, the segmentation for microblog is difficult due to the noise problem and the OOV problem. In this paper, we present a Chinese Micro-Blog Segmentation system for the CIP-SIGHAN Word Segmentation Bakeoff 2012 track. The proposed system adopts a cascaded approach which contains three steps, correspondingly the preprocessing, the word segmentation and the post-processing. In the preprocessing step, the noise which contains the special characters is processed and removed. The remaining sentences are segmented in the second step. Finally, we use the dictionary to detect the OOVs which are not correctly segmented. The results show the competitive performance of our approach.
机译:最先进的中文分词系统在格式良好的长文档上已实现了高性能。然而,由于噪声问题和OOV问题,难以对微博客进行分割。在本文中,我们为CIP-SIGHAN Word Segmentation Bakeoff 2012赛道提供了一个中文微博客切分系统。提出的系统采用级联的方法,包括三个步骤,分别是预处理,分词和后处理。在预处理步骤中,将处理并消除包含特殊字符的噪音。剩下的句子将在第二步中进行细分。最后,我们使用字典来检测未正确分割的OOV。结果显示了我们方法的竞争性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号