首页> 外文会议>IEEE Conference on Energy Internet and Energy System Integration >A Chinese Word Segmentation Model for Energy Literature Based on Conditional Random Fields
【24h】

A Chinese Word Segmentation Model for Energy Literature Based on Conditional Random Fields

机译:基于条件随机字段的能源文献中的汉语分割模型

获取原文

摘要

Chinese word segmentation is one of the foundation and core tasks for Chinese natural language processing. Although some achievements have been made for Chinese word segmentation system in general domains, it is far away to meet practical requirements in energy domain. We focus on Chinese word segmentation standard and segmentation technology in the energy domain which consists of 13283 energy basic terms. This paper firstly proposes a conditional random field segmentation model. Then, the character features, character type features and conditional entropy features which influence the word segmentation performance are chose and described. Finally, the proposed model is tested on the dataset of the State Grid energy literature and compared with current word segmentation tools, such as the Harbin Institute of Technology's Language Technology Platform and the Tsinghua's THU Lexical Analyzer for Chinese language processing tools. The F1 value of the best result of the proposed model is 0.8319.
机译:中文字分是中国自然语言处理的基础和核心任务之一。虽然在一般域中的中文字分割系统中已经取得了一些成就,但它很远,以满足能量领域的实际要求。我们专注于中文词组分割标准和分割技术,其中包括13283个能源基本术语。本文首先提出了条件随机场分割模型。然后,选择和描述影响字分割性能的字符特征,字符类型特征和条件熵特征。最后,在国家电网能源文献的数据集上测试了所提出的模型,并与当前的词分割工具进行比较,例如哈尔滨理工学院语言技术平台和清华的The汉语处理工具的Thehua的Thu词汇分析仪。所提出的模型的最佳结果的F1值为0.8319。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号