在利用条件随机场进行基于词位标注的汉语分词时,特征窗口的宽度是决定条件随机场学习效果的重要参数.针对特征窗口最佳宽度的选择问题,设计了一组特征模板,并选取Bakeoff2005中的测试语料,使用CRF++0.53工具包进行了对比实验,定量分析了影响分词效果的有效上下文范文.通过实验得出以下结论:下文对分词性能贡献要大于上文;影响分词性能的特征窗口的宽度不超过五,以四字或五字窗口为宜.%In Chinese word segmentation with Conditional Random Field (CRF), the size of feature window plays a crucial role in corpus training. To find the proper size of feature window, a group of feature templates were selected for the comparative tests performed on Bakeoff2005 with toolkit CRF + + 0. 53 considering the effective range of context. The results are: (1) contribution of below-context is greater than above-context; (2) size of feature window influencing the segment performance is no larger than 5, the proper size is four or five.
展开▼