A Chinese Word Segmentation Model for Energy Literature Based on Conditional Random Fields

机译：基于条件随机字段的能源文献中的汉语分割模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Chinese word segmentation is one of the foundation and core tasks for Chinese natural language processing. Although some achievements have been made for Chinese word segmentation system in general domains, it is far away to meet practical requirements in energy domain. We focus on Chinese word segmentation standard and segmentation technology in the energy domain which consists of 13283 energy basic terms. This paper firstly proposes a conditional random field segmentation model. Then, the character features, character type features and conditional entropy features which influence the word segmentation performance are chose and described. Finally, the proposed model is tested on the dataset of the State Grid energy literature and compared with current word segmentation tools, such as the Harbin Institute of Technology's Language Technology Platform and the Tsinghua's THU Lexical Analyzer for Chinese language processing tools. The F1 value of the best result of the proposed model is 0.8319.

机译：中文字分是中国自然语言处理的基础和核心任务之一。虽然在一般域中的中文字分割系统中已经取得了一些成就，但它很远，以满足能量领域的实际要求。我们专注于中文词组分割标准和分割技术，其中包括13283个能源基本术语。本文首先提出了条件随机场分割模型。然后，选择和描述影响字分割性能的字符特征，字符类型特征和条件熵特征。最后，在国家电网能源文献的数据集上测试了所提出的模型，并与当前的词分割工具进行比较，例如哈尔滨理工学院语言技术平台和清华的The汉语处理工具的Thehua的Thu词汇分析仪。所提出的模型的最佳结果的F1值为0.8319。

著录项

来源
《IEEE Conference on Energy Internet and Energy System Integration》|2018年|584p|共4页
会议地点
作者
Liujun Zhao; Weizheng Kong; Bo Chai;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TK01-53;
关键词
Entropy; Hidden Markov models; Dictionaries; Labeling; Tools; Natural language processing; Vocabulary;

机译：熵;隐藏的马尔可夫模型;词典;标签;工具;自然语言处理;词汇;

相似文献

外文文献
中文文献
专利

1. Chinese Word Segmentation Based on Conditional Random Field [J] . Junxia Deng, Hong Zhang, Shanzai Li Machine Learning Research . 2017,第3期

机译：基于条件随机场的中文分词
2. Unsupervised SAR image segmentation using high-order conditional random fields model based on product-of-experts [J] . Zhang Peng, Li Ming, Wu Yan, Pattern recognition letters . 2016,第Jula15期

机译：基于专家产品的高阶条件随机场模型的无监督SAR图像分割
3. Category Level Object Segmentation by Combining Bag-of-Words Models with Dirichlet Processes and Random Fields [J] . Diane Larlus, Jakob Verbeek, Frédéric Jurie International Journal of Computer Vision . 2010,第2期

机译：通过将词袋模型与Dirichlet过程和随机字段相结合来进行类别级对象分割
4. A Chinese Word Segmentation Model for Energy Literature Based on Conditional Random Fields [C] . Liujun Zhao, Weizheng Kong, Bo Chai IEEE Conference on Energy Internet and Energy System Integration . 2018

机译：基于条件随机场的能源文献中文分词模型
5. Model-based Single-microphone Speech Separation Using Conditional Random Fields. [D] . Yeung, Yu Ting. 2014

机译：使用条件随机场的基于模型的单麦克风语音分离。
6. Left ventricular segmentation from MRI datasets with edge modelling conditional random fields [O] . Janto F Dreijer, Ben M Herbst, Johan A du Preez 2013

机译：带有边缘建模条件随机场的MRI数据集的左心室分割
7. Effective Tag Set Selection in Chinese Word Segmentation via Conditional Random Field Modeling [O] . Zhao Hai, Huang Chang-Ning, Li Mu, 2006

机译：通过条件随机场建模的中文分词中有效的标签集选择

A Chinese Word Segmentation Model for Energy Literature Based on Conditional Random Fields

摘要

著录项

相似文献

相关主题

期刊订阅