融合规则与统计的微博新词发现方法

周霜霜; 徐金安; 陈钰枫; 张玉洁

首页> 中文期刊> 《计算机应用》 >融合规则与统计的微博新词发现方法

融合规则与统计的微博新词发现方法

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The formation rules of microblog new words are extremely complex with high degree of dispersion,and the extracted results by using traditional C/NC-value method have several problems,including relatively low accuracy of the boundary of identified new words and low detection accuracy of new words with low frequency.To solve these problems,a method of integrating heuristic rules,modified C/NC-value method and Conditional Random Field (CRF) model was proposed.On one hand,heuristic rules included the abstracted information of classification and inductive rules focusing on the components of microblog new words.The rules were artificially summarized by using Part Of Speech (POS),character types and symbols through observing a large number of microblog documents.On the other hand,to improve the accuracy of the boundary of identified new words and the detection accuracy of new words with low frequency,traditional C/NC-value method was modified by merging the information of word frequency,branch entropy,mutual information and other statistical features to reconstruct the objective function.Finally,CRF model was used to train and detect new words.The experimental results show that the F value of the proposed method in new words detection is improved effectively.%结合微博新词的构词规则自由度大和极其复杂的特点,针对传统的C/NC-value方法抽取的结果新词边界的识别准确率不高,以及低频微博新词无法正确识别的问题,提出了一种融合人工启发式规则、C/NC-value改进算法和条件随机场(CRF)模型的微博新词抽取方法.一方面,人工启发式规则是指对微博新词的分类和归纳总结,并从微博新词构词的词性(POS)、字符类别和表意符号等角度设计的微博新词的构词规则;另一方面,改进的C/NC-value方法通过引入词频、邻接熵和互信息等统计量来重构NC-value目标函数,并使用CRF模型训练和识别新词,最终达到提高新词边界识别准确率和低频新词识别精度的目的.实验结果显示,与传统方法相比,所提出的方法能有效地提高微博新词识别的F值.

著录项

来源
《计算机应用》 |2017年第4期|1044-1050|共7页
作者
周霜霜; 徐金安; 陈钰枫; 张玉洁;
展开▼
作者单位

北京交通大学计算机与信息技术学院;

北京100044;

北京交通大学计算机与信息技术学院;

北京100044;

北京交通大学计算机与信息技术学院;

北京100044;

北京交通大学计算机与信息技术学院;

北京100044;

展开▼
原文格式 PDF
正文语种 chi
中图分类文字信息处理;
关键词
微博新词; 构词规则; 统计量特征; C/NC-value方法; 条件随机场模型;

相似文献

中文文献
外文文献
专利

1. 面向网络语言基于微博语料的新词发现方法 [J] . 雷一鸣 ,刘勇 ,霍华 . 计算机工程与设计 . 2017,第003期
2. 基于改进互信息和邻接熵的微博新词发现方法 [J] . 夭荣朋 ,许国艳 ,宋健 . 计算机应用 . 2016,第010期
3. 基于微博内容的新词发现方法 [J] . 霍帅 ,张敏 ,刘奕群 . 模式识别与人工智能 . 2014,第002期
4. 基于概率统计技术和规则方法的新词发现 [J] . 贾自艳 ,史忠植 . 计算机工程 . 2004,第020期
5. 基于改进位置成词概率的微博新词发现算法 [J] . 邹志文 ,朱红泽 ,李玲 . 电脑知识与技术 . 2019,第001期
6. 新词语发现手段和新词语词典编纂浅谈 [C] . 谢俊英 . 第五届全国语言文字应用学术研讨会 . 2007
7. 微博新词发现及新词情感极性判断方法 [A] . 王欣 . 2018

融合规则与统计的微博新词发现方法

摘要

著录项

相似文献

相关主题

期刊订阅