基于上下文词频词汇量指标的新词发现方法

邢恩军; 赵富强

首页> 中文期刊> 《计算机应用与软件》 >基于上下文词频词汇量指标的新词发现方法

基于上下文词频词汇量指标的新词发现方法

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This article presents a statistic index which is based on contextual word frequency-contextual word count (WF-CWC).WF-CWC,by modifying the definition of the parameters in information entropy formula,i.e.,changing the occurrence frequency of adjacent strings in corpus to the size of the adjacent strings collection,overcomes the defect of left and right information entropies being not significant in characteristics when identifying new words.Meanwhile,this paper presents a recursive and adjacent relation-based string concatenation method,which overcomes the disadvantage of the fixed sliding window size in N-gram model.Empirical analysis indicates that this new word identification method has higher accuracy.Through selecting different WF-CWC as the thresholds,it can make flexible adjustment in finding more new words or improve the accuracy of new words identification,and this provides a practical approach for new words identification.%提出一种基于上下文词频词汇量的统计指标。该指标通过修改信息熵公式中参数的定义，即将邻接字符串在语料集中出现的次数改成邻接字符串集合的大小，克服了左右信息熵在识别新词时特征不够明显的缺点。同时提出一种递归的基于邻接关系的字符串连接方法，克服了N-gram方法采用固定滑动窗口大小的缺点。实证分析表明该新词发现方法有较高的准确率，通过选取不同的词频词汇量指标值作为阈值，能够在发现更多新词和提高发现新词的准确率方面进行灵活调整，为新词发现提供一种实用的方法。

著录项

来源
《计算机应用与软件》 |2016年第6期|64-67|共4页
作者
邢恩军; 赵富强;
展开▼
作者单位

天津大学管理与经济学部天津300072;

天津财经大学信息科学与技术系天津300222;

天津财经大学信息科学与技术系天津300222;

展开▼
原文格式 PDF
正文语种 chi
中图分类文字信息处理;
关键词
新词发现; 上下文信息熵; 词频词汇量指标;

相似文献

中文文献
外文文献
专利

1. 基于新词发现的古典文学作品分词方法研究 [J] . 高嘉琦 ,赵庆聪 . 计算机技术与发展 . 2021,第009期
2. 基于信息传播特性的新词发现方法研究 [J] . 曹春萍 ,杨青林 . 软件 . 2020,第009期
3. 基于古汉语语料的新词发现方法 [J] . LIU Yutong ,WU Bin ,XIE Tao . 中文信息学报 . 2019,第001期
4. 基于句法分析与词向量的领域新词发现方法 [J] . 赵志滨 ,石玉鑫 ,李斌阳 . 计算机科学 . 2019,第006期
5. 面向网络语言基于微博语料的新词发现方法 [J] . 雷一鸣 ,刘勇 ,霍华 . 计算机工程与设计 . 2017,第003期
6. 基于古文语料的新词发现方法 [C] . Yutong Liu ,刘昱彤 ,Bin Wu . 第十七届全国计算语言学学术会议暨第六届基于自然标注大数据的自然语言处理国际学术研讨会（CCL 2018） . 2018
7. 微博新词发现及新词情感极性判断方法 [A] . 王欣 . 2018

基于上下文词频词汇量指标的新词发现方法

摘要

著录项

相似文献

相关主题

期刊订阅