首页> 中文期刊> 《计算机应用研究》 >基于位置标签与词性结合的组合词抽取方法

基于位置标签与词性结合的组合词抽取方法

         

摘要

现有分词系统不能及时收录新词语,因而不能有效识别领域组合词。针对此问题,提出一种位置标签与词性相结合的组合词抽取方法。首先对语料进行文本预处理、添加位置标签、加权词频过滤等建立词条的位置标签集;然后依据位置标签集计算词条在句子中的相邻度判定组合词;最后制定反规则对抽取结果进行过滤,并对垃圾串进行两端逐步消减再判定进一步识别组合词。通过在不同语料库上进行实验,结果表明本方法具有更高的准确率。%Now existing segmentation systems cannot recruit new words timely,so they cannot identify compound words effec-tively.To solve that,this paper proposed a method of compound word extraction based on location tag and POS (part of speech).First,this method established location tag set for each item by processing corpus texts,adding location tag for each item and filtering items with weighted term frequency.Then it counted adjacent degree to judge compound words on the basis of location tag set.Finally,formulated reverse rules and filtered garbage strings with them,detected combined words further from garbage strings by removing item from the head and the tail.Experiments were carried out on different corpora,and the results show that this method has higher precision.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号